STATISTICAL 
INFERENCE 


fot x + 
Pu 


P t 


Statistical Inference 


HELEN М. WALKER 


Professor of Education 
Teachers College, Columbia University 


JOSEPH LEV 


Senior Statistician 
Department of Civil Service 
State of New York 


Ze 
Statistical 


Inference 


T. We 
v E 
fis Library 4 
9 А 
DEC 
++ 


owe * 
х, 
У, Calcutta ey 
AL Ino 


HOLT, RINEHART AND WINSTON 


NEW YORK 


CHICAGO 
TORONTO 


SAN FRANCISCO 


S.C.E.R.T., West Bengal 
Date .|<... 3. Е 
Асс. №о.1919 


99e sos еве 044 


Copyright, 1953 
by 
HOLT, RINEHART AND WINSTON, Inc. 


library of Congress Catalog Card Number: 52-13908 


29187-0513 


Printed in the United States of America 


Preface 


A person using statistical data finds himself beset by many 
uncertainties. Usually, in dealing with groups which have not been 
studied, he wishes to use conclusions based on the study of one 
group. How reliable are such conclusions? On what basis shall 
he choose among the various available measures of average, spread, 
and relationship? How large a sample shall he take and what 
design shall he follow in selecting his cases? 

Answers to such questions are considered in this book from the 
unified point of view of the relation of sample to population. From 
this point of view, the statistical problems turn out to be matters 
of testing hypotheses, estimation, and experimental design. Per- 
sons working with statistical data often think that their interest 
is only in an analysis of the data at hand. In most cases, however, 
further reflection shows that the interest in the data at hand is due 
to its bearing on situations or groups other than those actually 
observed. 

Because of this point of view, the concept of inference is in- 
troduced in the very first chapter by use of simple examples. By 
wholly intuitive reasoning the reader is led to make his own in- 
ferences without any complex mathematics, or even tables. The 
next two chapters deal with problems from a population of only 
two classes. In this simple context are developed most of the con- 
cepts of statistical inference which are used throughout the book. 

The intuitive approach is used to introduce all concepts. Many 
simple numerical examples and graphic devices help to maintain 
the clarity of exposition. Per cents and proportions are considered 
first because they are the simplest summary measures with which 
one deals. Means, standard deviations, and a variety of measures 
of simple and multiple relationship are introduced later, together 
with their sampling theory. 

Diverse experimental designs are considered. The simplest is 
the comparison of two proportions and of two means. More com- 
plex designs are described under the headings of analysis of variance 
and analysis of covariance. The use of the power function in experi- 
mental design, still a novelty in textbook literature, is described. 

Elementary computations are introduced as needed in the text, 


№. Preface 


for example, reading the normal curve in Chapter 2, computation 
of mean and standard deviation of a sample in Chapter 7, computa- 
tion of regression and correlation in Chapter 10. Consequently, no 
previous course is needed by the person who has good facility in 
quantitative thinking and who handles symbolism with ease. How- 
ever, the person who works with difficulty in such materials will 
profit by having had a slow-paced introductory course in which he 
works with descriptive statistics and masters computational skills 
(and students have remarkable ability to appraise their own com- 
petences and needs in these matters.) 

The book does not presuppose any college training in mathe- 
matics and does not present mathematical derivations of formulas. 
At first glance it may look more mathematical than most texts 
intended for the non-mathematician because of its use of probability 
symbolism, multiple subscripts, and the expressed limits of summa- 
tion. However, the reader is given careful explanation and practice 
exercises related to these usages. We are convinced that the effort 
of learning these symbols is more than repaid by increased clarity of 
understanding. While the text is planned for the non-mathematical 
reader, we believe that a mathematically competent student who 
intends to study the mathematical theory of statistics will profit 
by such an introductory course which consistently attempts to de- 
velop concepts in relation to real problems. 

The diversity of techniques used by the research workers in 
any field is today so great that very careful selection must be made 
in order to cover the most essential in a two-semester course. 
However many students are able to take only a one-semester course 
and subsequently find themselves in need of methods other than 
those studied, for which they must usually consult sources using 
unfamiliar style and symbolism. If the first 9 chapters of this 
book are covered in a one-semester course, the student will have 
acquired understanding of the basic concepts and will be in posses- 
sion of a text in which he can readily locate other techniques as he 
finds need for them. If more class time is available, the book pro- 
vides material appropriate for a two-semester course. 

The first 10 chapters are so tightly organized that little change 
in the order of presentation is advisable. Any of the later chapters 
can be omitted, or the order shifted, without great disadvantage. 
If a teacher wishes to include correlation (Chapter 10) in a one- 
semester course, the following sections might be omitted without 
affecting the clarity of the remaining material: 


њм 


Preface · v 
Two by Two Table in Very Small Samples pages 103-106 
Test of Normality pages 119-123 
Design of Samples in Surveys pages 171-177 


Bartlett Test for Homogeneity of Variance pages 192-195 
Comparison of Means of Several Measures 
of the Same Individual pages 216-229 


Many of the exercises are developmental in nature and should 
not be omitted. Answers are provided in order that students may 
appraise their own progress. Additional exercises and review 
questions will be provided by the publishers at nominal cost. 

In a course in statistical inference for non-mathematical students 
a crucial problem is the treatment of probability. One familiar 
approach is to give careful instruction in the routine of reading 
tables without attempting to develop any real understanding of 
the meaning of the values obtained from such reading. Another is 
to give considerable instruction in combinatory probabilities and 
the binomial distribution for its own sake using problems of the 
type commonly found in college algebra texts, without making clear 
the relevance of such work to concrete problems. The first approach 
admittedly leaves the student rigid, unable to make even slight 
adaptations in the methods he has been taught, because he does not 
really understand them. Тһе second often frightens the non- 
mathematician out of the course. We have tried to steer clear of 
both pitfalls by an intuitive development of probability concepts 
as an aid to the solution of statistical problems. 

A decimal subscript attached to a statistic always relates to the 
percentile rank of that statistic, not its significance level. In this 
departure from tradition we have followed the example of Dixon 
and Massey. Experience with the new practice in six classes con- 
vinces us that it obviates the confusion produced under the usual 
practice by which ¢ and z are given subscripts based on a two-sided 
region and x? and F subscripts based on a one-sided region. 

Data for illustrative purposes have been taken from the pub- 
lished work of many persons and to each of these, named at the 
places where their data are introduced, we express our thanks. 

A modern text presents statistical theory and methods origi- 
nated by a large number of persons and discussed by various succes- 
sive writers each of whom adds some new facet to the development, 
so that a proper acknowledgment of the sources of the authors’ 
ideas would require an extensive historical analysis not appropriate 


vi Preface 


for a text of this type. An unfortunate concomitant of this situa- 
tion is the tendency of some readers to ascribe credit to a textbook 
writer for everything he presents. In this as in other texts, only a 
few formulas were originated by the authors and these few are not 
identified. References placed at the ends of chapters are selected 
not so much to acknowledge original sources as to acquaint the 
reader with additional material he is likely to find helpful. The 
authors gladly recognize their indebtedness to all the great writers 
and teachers of statistical theory, from whose work they have 
profited beyond all possibility of acknowledgment. 

Certain tables and charts have by permission been reproduced in 
part or in entirety from the publications of Z. W. Birnbaum, C. J. 
Clopper, H. A. David, F. N. David, R. A. Fisher, J. C. Flanagan, 
T. Г. Kelley, Е. J. Massey, М. Merrington, Е. Mosteller, E. S. 
Pearson, N. Smirnov, G. W. Snedecor, C. M. Thompson, and 
F. Wilcoxon. The usefulness of any modern text depends in large 
measure upon the generosity of the authors and publishers of such 
tables and charts. Still other tables such as those of square roots, 
logarithms, and the normal curve have become almost common 
property and specific acknowledgment of the computers is well nigh 
impossible. 

For assistance in checking computations and in making the 
answer key we are glad to acknowledge the help of Ying Chan, 
Philip Jackson, and Herman Ravitch. For expert typing of difficult 
material in three preliminary mimeographed editions and preparing 
the final copy for the printer, we owe much to Gertrude Ramsdell. 
The criticisms and suggestions of the students who struggled through 
three mimeographed versions are responsible for some of the helpful 
features of the text, but these students are too numerous to name 
individually. We are particularly indebted to Lincoln Moses, who 
wrote the final chapter, and who, as the result of his experience in 
using the third mimeographed version in one of his classes, pointed 
out several places which needed clarification. 


New York City H. M. W. 


Albany, N.Y. БЕ; 
May 15, 1953 


Table of Contents 


1 INFERENCES BASED ON SIMPLE EXPERIMENTS 


The nature of statistical inquiries; Experiments in judging intelli- 
gence from photographs; Statistical hypotheses and their verifica- 
tion. 


2 PROBABILITY DISTRIBUTIONS 


Observations; Variables; Populations defined on a discrete variable; 
Random observations; Probability; Probability distribution; Sym- 
bolism; Mutually exclusive classes; Exhaustive classes; Independ- 
ence of observations; The probability of two or more occurrences; 
Factorial; A population of two classes; Parameter and statistic; 
The binomial distribution; The graphic representation of the bi- 
nomial distribution; Mean of the probability distribution for a dis- 
crete variable; Variance and standard deviation of the probability 
distribution for a discrete variable; Mean and standard deviation 
of asample; Mean and standard deviation of the binomial distribu- 
tion; Approximation to the binomial distribution; The normal 
distribution; Reading а table of normal probability; Computing bi- 
nomial probabilities by use of normal probability tables; The close- 
ness of approximation of the normal eurve to the binomial. 


3 INFERENCES CONCERNING PROPORTIONS 


Test of hypothesis P = .5 by use of the binomial distribution; Tests 
of several hypotheses on P; Estimation of P; Estimation by a 
single value; Interval estimate; Interval estimate for P when 
N = 10; Location of confidence limits for samples of varying size 
by the use of charts; Confidence intervals for other parameters; 
Probability and confidence; Aids available for computing binomial 
probabilities; One-sided and two-sided tests of hypotheses; Two 
kinds of error; Choice of critical region; Large sample tests con- 
cerning proportions; Confidence limits for proportions computed 
from large samples; Number of cases needed to obtain a confidence 
interval of a given width; Sampling from a finite population; 
Sample size in tests of hypotheses; Test of hypothesis that two 
population proportions are equal when each is estimated from a 
large number of observations and when the estimated common pro- 
portion is not near 0 or 1. 


4 CHI-SQUARE 


Populations consisting of several discrete classes; Sampling dis- 
tribution of response types; Chi-square as a measure of discrepancy 
between observed and expected frequencies; Exact probabilities of 
chi-square obtained by enumeration; Chi-square curves and table; 
Degrees of freedom; Reading the chi-square table; Comparison of 


M 


iii - Contents 
observed and theoretical frequencies by means of x?; Computing 
chi-square from data in form of per cents; Tests of independence 
in contingency tables; Relation of x? to the statistic of Formula 
(3.12); Comparison of two proportions based on the same indi- 
viduals; The two-by-two contingency table in very small samples; 
Use of the chi-square approximation when expectations are small. 


5 POPULATIONS AND SAMPLES ON A CONTINUOUS VARIABLE 


Mathematical model of the population; The mean and variance of a 
population; Normal populations; Use of summation sign; Defini- 
tion of mean, variance, and standard deviation for a sample; Formu- 
las for the computation of the mean, variance, and standard devia- 
tion for a sample; Sampling distributions; Unbiased estimate of a 
parameter; Test of normality applied to a distribution of sample 
variances. 


6 SAMPLING DISTRIBUTIONS 


Purposes of the chapter; The data; Use of table of random sam- 
pling numbers; A class project in drawing samples, computing cer- 
tain statistics and obtaining the random sampling distributions of 
those statistics; Samples from a normal population; Independence 
of statistics; Degrees of freedom; Five important distributions; 
Statisties which have a normal distribution; Statistics which have 
a chi-square distribution; Statistics which have "Student's" dis- 
tribution; Statistics which have the F distribution. 


7 INFERENCES CONCERNING THE MEAN OR THE DIFFERENCE 
BETWEEN TWO MEANS 


The assumption of a normal population; The central limit theorem; 
X as an estimate of и; Tests about u when c is known; "Student's " 
distribution; Tests about и when о is unknown; Confidence interval 
for р; Mean of a population of differences between two measures for 
each individual; Test of hypotheses about ш — д»; Alternatives 
to a hypothetical mean; Choice of critical region; The number of 
cases needed to reject a hypothesis concerning ш when some alter- 
native is true; Standard error of X in samples from finite popula- 
tion; Design of samples in surveys; Stratified sampling; Cluster 
sampling. 


8 INFERENCES CONCERNING VARIANCES AND STANDARD DEVI- 
ATIONS OF NORMAL POPULATIONS 

Sampling distribution of the Statistic (N — 1)s*/o?; Interval 
estimate for the variance; The chi-square distribution when the 
number of degrees of freedom is large; Large sample estimate for 
the standard deviation; Ratio of the variances of two independent. 
samples from populations with the-same variance: Tables of the F 
distribution; Relation of ће F table to tables of "Student's" dis- 
tribution, the chi-square distribution and the normal distribution; 
Comparison of two variances based on related scores; The variances 
of several samples from populations having the same variance. 


109 


124 


143 


179 


Contents · іх 


9 ANALYSIS OF VARIANCE 196 


Use of double subscripts; Comparison of three means; Mathe- 
matical model; Hypothesis to be tested; Estimate of o? from varia- 
tion among means; Estimate of c? from variation within groups; 
The variance ratio; Analysis of the archery data; Critical region: 
Form of the F distribution; Algebraic relations in analysis of 
variance; Computation procedures applicable when subgroups are 
not of uniform size; Comparison of means of several measures of 
the same individuals; Estimate of c? based on error; Estimate of 
o? obtained from variation between row means; Estimate of о? 
Obtained from variation between column means; Hypotheses and 
procedure for testing them; Additive nature of component sums 
of squares; Computations when original scores are not available; 
Factors influencing F. 


ТО LINEAR REGRESSION AND CORRELATION 230 


Linear regression; The correlation coefficient; Machine computa- 
tion of b, and rz, without the use of a scatter diagram; Computa- 
tion of bj, and rzy from a scatter diagram; A mathematical model 
for linear regression; Sampling distributions of statistics in linear 
regression; Confidence interval for py, Byz, o7y.2, and u,.;; Component 
sums of squares in regression; Test for linearity of regression; Bivar- 
iate population model; Normal bivariate population; Distribution 
of the correlation coefficient; Tests of the hypothesis that correlation 
is zero in the population; Tests that p is some value other than zero; 
Confidence interval for p; Test of the hypothesis that two inde- 
pendent populations have the same correlation; Test of the hypothe- 
sis that py. = pz: when computed for the same population; Regres- 
sion equations in the normal bivariate population; Additional 
formulas often useful in computations for correlation and regression. 


11 OTHER MEASURES OF RELATIONSHIP 261 


Biserial correlation; The point biserial coefficient of correlation; 
Test of significance for point biserial correlation; The biserial 
coefficient of correlation; Sampling theory of the biserial correlation 
coefficient; Comparison of biserial (ть) and point (rj) coefficients; 
Fourfold correlation; The Phi coefficient; Tetrachoric correlation; 
Comparison of Phi and Tetrachorie r; Estimating correlation from 
the tails of the distribution; The correlation ratio; Correlation 
among ranks; Test of significance for rank order coefficient; Esti- 
mation of product moment coefficient from rank order; Relation 
among ranks given by several judges; The Tau coefficient; Rela- 
tion between two traits expressed in qualitative categories. 


12 THE STATISTICS OF MEASUREMENT 289 


Symbols; Components of an observed score; Effect of measurement 
error on the mean; Effect of measurement error on the variance; 
Indirect estimate of the “true” variance; Indirect estimate of the 
error variance; Effect of measurement error on a coefficient of corre- 
lation; The coefficient of reliability; Effect on the coefficient of 
reliability of changing the length of the test; Effect on the correla- 


x : Contents 


tion between two measures of changing their reliabilities; Effect 
of measurement error on a regression coefficient; Effect of measure- 
ment error on tests of significance; Method of estimating a relia- 
bility coefficient from data. 


13 MULTIPLE REGRESSION AND CORRELATION 315 


Prediction of semester grade in a first course in statistics; Simplified 
problem with two predietors; Multiple regression equation with 
two predictors; Effectiveness of prediction by a multiple regression 
equation; Multiple correlation; Partition of the sum of squares; 
'The normal equations; The Doolittle method of solving the normal 
equations by successive elimination; Checks; Fisher modification of 
the Doolittle method; Regression equation for a different criterion 
and the same predictors; Tests of significance for partial regression 
coefficients; Elimination of one predictor from the regression equa- 
tion; Partial correlation; Test of significance for a partial correla- 
tion; Relation of partial and multiple correlation coefficients; Con- 
sistency among coefficients; Iterative method of obtaining regression 
weights. 


14 ANALYSIS OF VARIANCE WITH TWO OR MORE VARIABLES OF 
CLASSIFICATION 348 


Two bases of classification with n individuals in each cell; Sub- 
division of sum of squares; Orthogonal comparisons; The estima- 
tion of error; Mathematical models for the two-way layout; 
Experimental procedure replicated on each of several individuals; 
Sources of variation; Tests of significance for interactions; Choice 
of error variance for the main effects; Latin square and Graeco- 
Latin square; Application of Graeco-Latin square to study on mem- 
orizing music; Latin square pattern with several observations in 
each cell; Unequal frequencies in the subclasses; Two samples 
matched in subgroups on a related trait. 


15 ANALYSIS OF COVARIANCE 387 


Symbolism in the analysis of covariance; Several Populations with 
one Predictor Variable. Numerical computation of C values; Com- 
parison of regression within the groups with regression for the com- 
bined group; Test of hypothesis of common slope; Test of the 
hypothesis 8,, = 0; Measure of sampling variability; Test of hypoth- 
esis that a single regression line fits all populations; Total sum of 
squares; Regression among the means; Comparison of the adjusted 
means; Matched Regression Estimates. Two Populations with One 
Predictor Variable. Significance of the difference Y; — Y, for а 
partieular value; Region of significance; Several Populations with 
Two Predictor Variables; Matched Regression Estimates. Two 
Populations with Two Predictor Variables. Region of significance. 


16 PERCENTILES 413 


Notation for percentiles; Quantiles; Distribution of percentiles 
in large samples; Consistency of estimate; Efficiency of estimate; 
Classes of efficient statistics; Estimation of the mean and standard 


Contents - xi 


deviation by percentiles; Estimation of the mean and standard 
deviation from item analysis data; Estimation of the correlation 
coefficient. 


17 TRANSFORMATION OF SCALES 


"Transformation of proportions into angles; The Square Root trans- 
formation; The Logarithmic transformation; Transformation of 
ranks to normal deviates; Normalization of the F distribution; 
Uniformization. 


18 NON-PARAMETRIC METHODS 


A. Tests for Comparison of Two Samples 
Kolmogoroy-Smirnoy test; Run test; The sign test; Extension 
of the sign test; Signed rank test for paired observations; Sum 
of ranks; Median test for two samples. 

B. Comparison of k samples 
Median test for k samples; Sum of ranks for comparing k 
samples; Analysis of variance by ranks. 

C. Confidence Intervals 
Confidence interval for the median; Kolmogorov-Smirnov con- 
fidence band for cumulative frequency; Comparison of Kolmo- 
gorov-Smirnov method with x*; Confidence interval for uz — шу; 
Graphic method of obtaining confidence interval for E(d). 

D. Tests of Independence 
Corner test of association; Ethnocentrism score. 


APPENDIX 
List of Tables and Charts in the Appendix 
Tables and Charts 
Glossary of Symbols 
Answers to Problems 
Index to Subject Matter 
Index of Authors 


423 


426 


451 
453 
454 
488 
494 
503 
509 


| Inferences Based on 


Simple Experiments 


It is the purpose of this chapter to introduce the reader to 
some of the simpler aspects of statistical inference in an intuitive 
way. Before describing the basis for and the logie of such in- 
ference some remarks on the nature of statistical inquiry may 
be appropriate. 

The Nature of Statistical Inquiries. Problems calling for 
statistical treatment always involve empirical, or observed, evi- 
dence but not all problems involving empirical evidence are 
statistical. Factual information about a single individual is not 
statistical information. When was Shakespeare born? Is John 
taller than Sam? What does this load of coal weigh? Did the 
prisoner go to his home before or after he had been with the 
murdered man? These are factual but not statistical questions. 

A statistical question always relates to a group of individuals 
rather than to a single individual and asks what is true of the group. 

Statistieal inquiries are of two types. One type of inquiry 
calls only for a description of the group of individuals actually 
observed. Summary measures, or statistics, such as per cents, 
averages or measures of variability are computed from the ob- 
servations made on members of the group. The statistics are 
then used for description of this particular group, but they are 
not used as foundation for a general theory applicable to similar 
individuals that have not been examined. 

A second type of statistical inquiry is more characteristic of 
scientific investigation. It involves a search for principles which 
have some degree of generality. If an investigation deals with 
matters of any general interest, the findings are usually applied 
to a much larger domain than the cases actually observed. We 
may call that larger domain the universe or the population and 
may call the group of cases observed the sample. Now the group 
characters of the sample, that is the statistics obtained from the 
sample, are to be used to obtain information concerning the un- 


2 - Inferences Based on Simple Experiments 


known group characters of the population. Such generalization 
from sample to universe is statistical inference, as contrasted with 
descriptive statisties which apply only to the group from which 
they are derived. 

General conclusions inferred from limited sets of observations 
are necessarily uncertain. When we reach a conclusion by in- 
ference from sample data we do so at the risk of being in error. 
This risk can be expressed as a probability and can be given a 
numerical value. It is the purpose of this book to describe methods 
which lead to valid inferences and to calculate the risk of error 
involved in those inferences. This book is concerned with prin- 
ciples and methods used in statistical inquiries from which general 
principles can be derived. 

Experiments in Judging Intelligence from Photographs. A 
good many people are convinced that they are able to judge the 
intelligence of a person by simply looking at his photograph. 
This ability seems to be of a kind which lends itself to study by 
experimental methods. 

A simple experiment might be to present to the subject two 
photographs of persons whose intelligence has been well deter- 
mined. In order to reduce the effect of extraneous cues, these 
should be photographs of two persons of same age and sex, simi- 
larly dressed and similarly posed. The subject is then asked to 
arrange the photographs in order of intelligence. We may ask 
how conclusive the outcome of this experiment will be. Clearly 
there are only two possible outcomes. Either he arranges them 
correctly, or he arranges them incorrectly. Suppose he arranges 
them incorrectly. Then surely he has failed to demonstrate 
ability to judge intelligence from photographs. Suppose, how- 
ever, he arranges the photographs correctly. Are we then to 
accept the conclusion that he can judge intelligence correctly from 
photographs? 

A difficulty in accepting this conclusion is the important part 
which chance could play. If the subject did not look at the photo- 
graphs at all but only at their backs, or if the photographs were 
marked A and B and he chose one letter as indicating the more 
intelligent person without looking at the photographs at all, we 
might say that he had made a chance arrangement. Under these 
circumstances the two possible outcomes (both photographs right 
or both wrong) are equally likely. The probability of each result 
by chance is therefore said to be $. Since the probability of reach- 


Judging Intelligence from Photographs - 3 


ing a correct result by chance is so great, when a correct result 
has been reached one can have little confidence that it was due 
to some cause other than chance. 

The experiment just described is evidently not sufficiently ex- 
tensive to reveal satisfactory evidenee of ability even when it 
exists. The probability of obtaining a correct result by chance 
alone can be reduced by using more photographs. If three photo- 
graphs, A, B, and C, are used there are six possible orders in which 
they can be arranged, namely, ABC, ACB, BAC, BCA, CAB and 
CBA. Only one of these orders is correct. Let us assume that 
order to be ABC. Under the hypothesis that chance alone oper- 
ates in the arrangement, each order has the same probability 
and that probability is 2 = .167. Consequently, if the person 
whose ability is being questioned arranges these photographs cor- 
rectly, we would have some confidence that the outcome is not due 
to chance. However, the probability of arriving at a true answer 
on a chance basis is still greater than that which statisticians 
accept as desirable. 

Before proceeding with a discussion of a more extensive ex- 
periment it is worth while examining the six arrangements a little 
further. Here a correct placement will be indicated by a capital 
letter, an incorrect placement by a lower case letter. 


One arrangement is entirely correct: ABC 

Three arrangements have one correct placement: Acb, 
baC, and cBa. 

Two arrangements have no correct placement: cab and bca. 

"Therefore the probability of three correct placements by 
chance is $; of two correct is 0; of one correct is 2; of 
none correct is 2. 


The elementary outcomes have now been grouped in accord- 
ance with the number of correct placements, and to each number 
of correct placements has been attributed a probability. The 
placements and their probability may be exhibited as follows: 


Number of correct Probability 
placements 
3 $ = 167 
2 0 
1 $ = .500 
0 $ = .383 


H 
3 
Ё 
E 


4 - Inferences Based on Simple Experiments 


This is an example of a probability distribution of which more 
will be said in the next chapter. 

Consider now an experiment with four photographs. There 
are 24 possible arrangements of which only one is completely 
accurate. Under the hypothesis that chance only operates, the 
probability of a completely correct arrangement is зт or .042. 
Consequently one would be inclined to give considerable credence 
to the idea that a subject who can arrange four photographs in 
the same order of intelligence as that of the persons photographed 
does have ability to judge intelligence from photographs. 

In Table 1.1 are shown the 24 possible arrangements of four 
pictures grouped in accordance with the number of photographs 
in correct position. Capital letters show photographs in correct 
position, small letters shows photographs in incorrect position. 
The probability which corresponds to each group is shown in the 
column at the right. 


TABLE 1.1 Classification of all Possible Arrangements of Four Photographs, 


A, B, C and D 
Number р Number of 
Possible m. 
correctly arrange- Probability 
arrangements 
placed ments 
4 ABCD 1 dr = 042 
3 0 0 
2 ABde dBCa Ар 6 ar = .250 
AdCb cBaD  baCD 
i Acdb bdCa cBda сар 8 He = .333 
Adbc daCb dBac  bcaD 
0 badc cadb dabe 9 de = 375 
bdac cdab ^ dcab 
beda cdba аса = 
Toran 24 1.000 


With five photographs the number of possible arrangements 
would be 120, only one of which is entirely correct. Under the 
hypothesis of pure chance all these 120 arrangements are equally 
likely and so the probability of achieving a correct arrangement 
by chance is тіс = .008. 

We might go on considering the use of larger and larger num- 
bers of photographs. However, one purpose of this chapter is to 
present simple situations in which the reader can without too 
much difficulty enumerate the possible outcomes of an experi- 


Statistical Hypotheses and Their Verification - 5 


ment. As the number of photographs increases such enumeration 
becomes exceedingly tedious. For 6 there are 720 possible ar- 
rangements. Therefore we shall carry this particular type of 
experiment no further. 

Statistical Hypotheses and Their Verification. The proce- 
dures and modes of reasoning just described will be used so widely 
in the book that it is desirable to formulate them in general terms. 

For convenience the general formulation of the procedure and 
its application to the preceding problem will be displayed in 
parallel columns as a series of steps. 


Application to previous experiment 
1. A population of photographs of 


General formulation 


1. An empirical population is de- 


fined 


persons with varying intelligence 
is defined. 


2. A hypothesis about the рорша- 2. The hypothesis is formulated 
tion is formulated. that any effort by the experi- 
menter to arrange photographs 
in relation to intelligence will 
provide a result which is no 
better than a chance arrange- 

ment. 

3. The consequences of the hypoth- 3. In view of the hypothesis the 
esis are formulated. For this probability distribution in Table 
purpose a character of the sample 1.1 is obtained. This is the 
called a statistic is formulated and distribution of the statistic, 
the probability distribution of “number of photographs in cor- 
the statistic over all possible rect position.” 
samples is determined. This dis- 
tribution is called a sampling 
distribution. 

4, A random sample of elements 4, A random sample of four photo- 
from the population is drawn and graphs is drawn, and the pho- 
observations are made on those tographs are arranged by the 
elements. experimenter in what appears to 

him to be the correct order of 
intelligence of the persons photo- 
graphed. 

5. The value of the statistic is ob- 5. The number of objects in correct 


observed in the sample. 


position is counted. This num- 
ber is the observed value of the 
statistic. 


6 - Inferences Based on Simple Experiments 


6. The hypothesis is tested by com- 
paring the value of the statistic 
in the observed sample with the 
probability distribution for all 
possible samples. If the observed 
statistic has little likelihood of 
occurring the hypothesis is re- 
jected. Otherwise the hypothesis 


6. If the experiment results in all 
four photographs correctly ar- 
ranged, then Table 1.1 shows 
that such a result has probability 
of only .008 under the hypothesis. 
Since this usually is considered a 
low probability the hypothesis 
is rejected. 


is accepted. 


Now let us consider a different design which might have been 
employed. Suppose pairs of photographs are obtained such that 
each pair shows two persons of the same sex and same age, sim- 
ilarly dressed and similarly posed, but one of the two persons has 
an intelligence quotient under 85 and the other an intelligence 
quotient over 120. Some pairs are male, some female; some are 
children, some adolescents, some adults. Suppose that many 
such pairs of photographs are available, and from this supply an 
experimenter selects five pairs by some random process, and 
fastens each pair on a large cardboard. To decide which picture 
shall be placed at the right, he tosses a coin. The test is then 
administered by a different person who does not know which pic- 
ture is which, lest unconsciously he might give the subject a cue. 
Under each picture is an identifying number. Looking at each 
pair of pictures in turn, the subject tries to decide which picture 
belongs to a person with high intelligence and he records the 
number of that picture. For each pair on which he makes a cor- 
rect judgment his paper is scored +, for each pair on which he is 
in error it is scored —. 

The number of possible records is 25 = 32. These are listed 
in Table 1.2. If each of the 32 records is equally probable, each 
has probability 35 = .032. Thus the probability of a perfectly 
correct result by chance is .03. Suppose the .05 level of proba- 
bility has been agreed upon before the experiment was under- 
taken. A record of 5 correct choices has probability less than 
.05 and therefore gives sufficient evidence for rejecting the hy- 
pothesis that the choices were made by chance. A record of fewer 
than five correct choices is insufficient to reject the chance hy- 
pothesis. 


Statistical Hypotheses and Their Verification - 7 


TABLE 1.2 Probability Distribution of the Number Right out of Five Choices 
when Each Choice Is as Likely to be Right as to be Wrong 
(+ indicates a right choice, — a wrong one) 


Number of Probability if 
right Record Frequency all records are 
choices equally probable 
5 + + + + 1 zz = .0812 
+++ 4 - 
+++ - + 
4 ++= = ++ 5 3% = .1563 
+- + + + 
- +++ + 
+ + + - - 
++--+ 
++-+- 
+- ++- 
+- +-+ = 3125 
3 Е 10 38 = .3125 
- +++- 
- + + - + 
- +- ++ 
- -+++ 
++- -- 
+- + - - 
+--+- 
+- - -+ 
E EE 
2 TETA 2I өя 10 33 = .3125 
-+--+ 
Exo cya ne 
- -+-+ 
-- -++ 
i, eee P ad 
ET ee, S 
1 --+-- 5 зу = .1563 
ае З 
---- + 
0 ---- 1 az = .0312 


Toran 32 1.0000 


2 Probability Distributions 


A more formal treatment of probability and its relation 
to statistical inference will be presented in this chapter. 

Observations. In any statistical study it is essential to ob- 
serve some character of the objects under consideration and to 
make a record of that observation. Suppose, for example, that 
the character is age and that the objects are persons. The record 
of "observation" may take the form of a numerical measure such 
as “age in years at last birthday”; it may be merely the assigning 
of an individual to the appropriate one of a series of age categories; 
or it may be merely the record + or — in answer to such a ques- 
tion as *Is he of voting age?" The individuals in a group may 
be ranked in order of age and the numbers 1, 2,3... М assigned 
to them, and these ranks will also be called observations. Thus 
the term observation is a quite general term which includes such 
more specific terms as measure, score, category, rank, presence or 
absence of a trait. An observation is a record of information 
about an individual regardless of the form of that record. Some- 
times for the sake of brevity an individual observation may be 
called “ап individual." 

Variables. When an observation is made on a characteristic 
of an individual, it is expected that one of several possible values 
of the characteristic will be observed. The values may be cate- 
gorical, as when observations are made on sex; or may be scaled, 
as observations on age; or ordered, as A is greater than B and B 
is greater than C. A characteristic which may take several values 
is called a variable. 

Two kinds of variables are considered in statistics, discrete 
and continuous. If a variable can take only a finite set of values 
it is called discrete. Variables like sex, marital status, or number 
of children in a family are discrete variables. 

Variables like height and weight are considered continuous be- 
cause they may take any value in a continuous interval of measure- 
ment. Because of the limitations of measuring instruments, the 
actual set of possible measures is finite, even in measurements 


Populations Defined on a Discrete Variable · 9 


on a continuous variable. Thus, in measurements on intelligence 
only a finite number of test зсогез: сап materialize. Nevertheless, 
it is convenient to deal with such variables as measures of a con- 
tinuous underlying trait. 

Sometimes it is convenient to deal with a continuous variable 
as if it were discrete. Thus one may divide ages of persons into 
groups, as, for example, ages below 40, ages 40 to 59, and e 
60 or above. 

The first four chapters of this book deal mainly with discrete 
variables. Beginning with Chapter 5 considerable attention will 
be given to continuous variables. 

Populations Defined on a Discrete Variable. The set of ob- 
servations made in a statistical investigation may be considered as 
a sample from a larger, perhaps infinitely large, group of elements 
commonly called the population or the universe, and sometimes 
called the supply. When observations are made on a discrete 
variable, the values of the variable may be thought of as classes 
into which the totality of elements is distributed. From this point 
of view of the population an observation is merely a drawing of 
an element from one of the classes which make up the population.’ 

The population is characterized by these classes and by the 
proportion of all elements in each class. We shall use the symbols 
C; for the ith class, and P; for the proportion of elements of the 
population contained in this class. 

There are two kinds of populations. In one kind the elements 
making up the population actually exist. Such a population is 
encountered in making a survey of a community. If the variable 
studied is, say, family size, a single family is an element of the; 
population, and all these elements may be considered as dis- 
tributed in classes according to their numerical value. If the 
community is large the study of family size may be conducted: 
by use of a sample. - 

Another kind of population is usually involved in expone E 
Here the totality of elements is only conceptual and not existent. 
For such a population it is preferable to think of Р; as the chance, 
or probability, that an observation is in class C;. . Thus, in throw-: 
ing à coin the population consists of the readings on all possible 
throws, but since “all possible throws" can never be made, the 
totality of all such readings is only conceptual. Suppose that 
the class C; is head and the class C»is tail. If the coin is unbiased 
and the throw is fair, then P; = $ is the chance of obtaining a 


10 - Probability Distributions 


head, and Р» = 4 is the chance of obtaining a tail. The experi- 
ments described in Chapter 1 were based on populations which 
are conceptual. For these reasons the conclusions drawn from 
an experiment are conclusions about a hypothetical population 
defined by the classes and by related proportions, rather than 
about an existing population. 

Random Observations. Statistical logic has as its goal the 
generalizing from a sample to a population. The degree of con- 
fidence which can be placed in such a generalization can be es- 
timated only if the observations may be considered as drawn at 
random from the population. Consequently, much thought must 
be given to methods of achieving randomness when a statistical 
investigation is planned. 

When a statistical study deals with observations on a finite 
population, say the population of a city, an element is considered 
to be drawn at random if, and only if, some device is used by 
means of which any one element of the population is as likely to 
be drawn as any other element. This situation never obtains 
by conscious choice. It is completely impossible for a person to 
pick a book at “at random" from a bookcase or even to pluck a 
blade of grass “at random” if he looks at it. Some device designed 
to produce a random selection is essential. If the population is 
finite, each individual in it may be given a number, those num- 
bers may be written on little tags, the tags placed in a large 
container and thoroughly stirred, and a tag drawn out by a blind- 
folded person, or shaken out by some mechanical device. That 
would be correct but laborious. An easier and equally correct 
procedure is to make use of a table of random numbers such as is 
described in Chapter 6. 

The criterion that every individual in the population must 
have the same chance to be included in the sample is necessary 
but is not sufficient to define a random sample. Suppose, for 
example, that one has 5000 cards of members of a union arranged 
alphabetically, and that a sample of 50 is needed. Suppose the 
method for sampling is to select an individual card at random 
and then take every 10th card in succession after it until 50 cards 
have been drawn. Any individual has the same chance to be 
chosen as any other but this is not a random sample. It is also 
necessary that every set of 50 must have the same chance to be 
chosen as any other set of 50 and there are many sets of 50 cards 
which could not possibly be obtained by this procedure. A better 


Random Observations - 11 


way would be to go on drawing individuals at random until 50 
have been chosen. Strictly speaking, each card should be returned 
to the file after it has been drawn so that it could possibly be 
drawn more than once, but that is seldom done and failure to 
replace the individual after drawing has negligible effect on the 
outcome unless the population is small. 

In an experimental study it is not possible to enumerate all 
elements and to sample them randomly in the manner just de- 
scribed. In such a study it is necessary to plan the experiment 
so as to provide a fair representation of the population. To take 
a simple example, consider an experiment to test the hypothesis 
that a given coin is unbiased, that in a fair toss it is as likely to 
fall head up as to fall tail up. An experiment requires that a 
number of fair tosses be made. To insure fairness several persons 
may be asked to make tosses. The several samples of tosses may 
then be compared, or combined. 

If the problem is extended to a test of whether all 25-cent 
coins are unbiased, evidence from one coin would not be sufficient. 
Samples from a number of sources would be selected and each coin 
subjected to tosses by one or more individuals. 

The experiments described in Chapter 1 must also be ap- 
proached from this point of view. The photographs must be 
selected so as to provide a fair representation of the material which 
the subject may normally be expected to compare. Also the time 
chosen for the experiment must be fair to the subject. It is ob- 
viously impossible to choose these elements randomly from all 
possible elements. However, random devices can help in assuring 
fairness. 

In the experiment in which photographs are to be arranged in 
the order of intelligence of the persons photographed, the photo- 
graphs may be chosen randomly from a larger stock of available 
photographs. In this way the bias of the experimenter is elimi- 
nated. If possible the time, or repetitions of time, of the experi- 
ment may be randomized. 

In the experiment in which pairs of photographs are compared, 
it is desirable not only to choose them at random from a larger set 
of pairs, but also to randomize the order in which the pairs are 
presented. In addition each pair should be shuffled so that the 
photograph representing the higher intelligence is as often on the 
right as it is on the left. The composition of the pairs may itself 
be determined by a random process. 


12 - Probability Distributions 


In each experiment the element of randomization must be 
considered to insure a fair representation of the population which 
is studied. In an experiment comparing two teaching methods 
both methods may be tried by several teachers in each of several 
communities. Ideally the communities should be chosen at ran- 
dom. In addition, within each community, half the available 
teachers should be assigned at random to teach by one of the 
methods and half by the other. If such precautions are followed, 
the probability laws discussed in this book are applicable to the 


experiment. 
Probability. Suppose that in a population each observation 
falls in one and only one of the classes Су, С»... Cx; and sup- 


pose the proportion of observations in these classes to be Pi, 
Pa... Рь. Нап observation is drawn at random from the entire 
population, P; is the probability that it will come from class €; 
the probability that it will come from C; is Рз; and, in general, 
the probability that it will come from C; is Р;. In Table 1.1 on 
page 4 there are five classes (or categories) defined in terms of 
the number of photographs correctly placed. "These classes may 
be designated Co, Ci, Сз, Cs, and Са, and the corresponding pro- 
portions are з, 3%, 2, 2 and jy. These proportions are properly 
called probabilities. Similarly Table 1.2 has six classes and the 
corresponding proportions are called probabilities. 

In the situations considered in Chapter 1 the student could 
compute the probability that a random observation would fall in 
& particular class, because he could enumerate all possible forms 
that an observation could take. Later on we shall deal with prob- 
lems for which the appropriate probability must be read from a 
table prepared by a mathematician. Before that stage is reached, 
it is important for the student to develop clear concepts about the 
meaning of probability in these simpler situations. 

Probability Distribution. A distribution showing a set of de- 
fined classes Си, С»... C; and the probability that a random 
observation X will fall in each of those classes is called a probability 
distribution. Thus the tables in Chapter 1 display probability 
distributions. 

Sometimes this random observation is a group character com- 
puted from a sample, such as a mean, or a standard deviation, or 
a per cent, or a coefficient of correlation, etc. A group character 
obtained from an observed sample is called a statistic. A proba- 
bility distribution showing the probability that a statistic from a 


Symbolism - 13 


random sample will fall in each of a set of defined classes is called 
a sampling distribution. In later chapters we shall speak of the 
sampling distribution of the mean, the sampling distribution of a 
per cent, and the like. 

Symbolism. A shorthand manner of writing probability state- 
ments is convenient because it reduces the amount of writing to 
be done and enables the eye to take in at a glance an idea which 
would be grasped more slowly if expressed entirely in words. The 
sentence “ P; is the probability that a random observation X will 
fall in the class C;" would be written symbolically as 

P(X is in C) = Pi 
ог Pr(X isin С) = P; 
Sometimes curled braces{ } are used instead of round parentheses. 
The expression inside the parentheses may be varied to suit the 
occasion. Thus for the data of Table 1.1 one might write 


P (4 correct placements } = .042 
P (3 correct placements } = .250, ete. 
and for the data of Table 1.2 on page 7 one might write 
Pr(all choices correct) = .0312 
Pr(3 choices correct) = .3125, ete. 


The probability that a head will appear on a single toss of a coin 
might be written merely P(head) or P(H). Р, is, also for brevity, 
called the probability of the class О. 


Symbolic Statements of Probability 


Words Symbol 
The probability that an observation X will P(X is in C) 
fall in Class C 
The probability that an observation X will P(X is in C; or C2) 


fall either in Class C; or in Class Cs 

The probability that if a first observation P(X, is in C; | X; is in C;) 
has been found to be in Class C;, a second 

will be found to be in Class C; 


Sometimes it is desirable to include in the probability sym- 
bolism a conditional clause, such as ‘‘the probability of four cor- 
rect choices when the number of choices to be made is 5," or 
*the probability of placing exactly 2 photographs in correct 


14 - Probability Distributions 


position if the number of photographs to be arranged is 4," or 
“the probability that a second observation falls in class j when a 
first observation has already fallen in class 7." For this purpose 
a vertical line is used to mean “when” or “if” and the conditional 
clause is written after the line. The three sentences just quoted 
might be written as follows: 


P(4 choices correct | N = 5) 
P(2 correct | N = 4) 
P(X: is in C; | X; is in С) 


Mutually Exclusive Classes. In a system of classification, if 
it is impossible for the same observation to be placed in each of 
two classes simultaneously, those classes are said to be incom- 
patible or mutually exclusive. Thus, if children are classified by 
age, the class of 5-year olds and the class of 6-year olds are mu- 
tually exclusive. If children are classified by sex the class of boys 
and the class of girls are mutually exclusive. However, the class 
of boys and the class of 6-year olds are not mutually exclusive 
because an individual may belong in both classes simultaneously. 

Exhaustive Classes. If in a system of classification every 
observation must be in at least one of the classes named, then 
those classes are exhaustive. Thus in Table 1.1 every possible 
selection must fall in one of the five classes named, no other 
classes are possible in this experiment, so these five classes are 
exhaustive as well as mutually exclusive. 

In a group of children aged 4 to 10, the class of 5-year olds 
and the class of 6-year olds are mutually exclusive but not ex- 
haustive; the class of age 4 to 7 and the class of age 6 to 10 are 
exhaustive but not mutually exclusive; the class of boys and the 
class of girls are both exhaustive and mutually exclusive. 

If classes are both exhaustive and mutually exclusive the sum 
of their probabilities is 1, as in Tables 1.1 and 1.2. 

Independence of Observations. Independence of observations 
is an important idea which will occur again and again throughout 
this book. Two observations are considered to be independent when 
information about one of them provides no clue whatever as to the 
other. Two observations are considered to be dependent if it is 
possible to make a better guess about one of them when you know 
what the other one is. 

An illustration will clarify this concept. Suppose a housing 
survey of a community is to be made by means of a sample in 


Independence of Observations · 15 


order to determine the extent of overcrowding, and suppose a list 
of all dwelling units in the community is available. Suppose 
further that by some random procedure a dwelling is selected, 
the interviewer visits it, and finds that it is a one-family house 
of 10 rooms occupied by 3 people. Now let a second dwelling 
unit be selected at random. Does the nature of the first observa- 
tion affect in any way the probability that the second observation 
will be a substandard dwelling? A superior housing unit? If 
selection is made at random, the probability of obtaining a par- 
tieular type of home is the same at every selection regardless of 
what was obtained on an earlier observation. In such a situation 
observations are independent. 

Now by contrast, let us assume that in order to save time and 
travel, an interviewer who has been asked to visit 5 homes takes 
all of them in the same block. If one of these homes is the 10- 
room house with 3 occupants mentioned in the preceding para- 
graph, the probability that any of the others will be a cold-water 
flat of 3 rooms occupied by 5 persons is very much smaller than 
the probability that a dwelling drawn at random from the entire 
list would be such a cold-water flat. "These 5 observations made 
on homes in one block are not independent. 

Observations which are independent with respect to one popu- 
lation may be dependent with respect to another. If the popula- 
tion being studied is the families living in Van Buren County, 
Towa, a random sample of families from that county provides a 
set of independent observations. However, if the population 
being studied is the families in rural Iowa, then a random sample 
of the families in Van Buren County provides a set of dependent 
observations. 


Laws for the Combination of Probabilities 


Addition Law: If two or more classes are mutually exclusive, the proba- 
bility that an observation will be in one or the other is the sum of the 
probabilities of the separate classes. 


Multiplication Law: If X, and X, are independent observations, the 
joint probability that X; will be in С; and X» will be in C; is the product 
of their separate probabilities. 


Several observations on one individual may be considered in- 
dependent if that individual alone is being studied. For example, 


16 - Probability Distributions 


one might be interested in variations of his blood pressure and 
might make observations at varying times. In this case, general- 
ization would be to the blood pressure of this particular person. 
If a population of persons is being studied, several measurements 
of the blood pressure of the same person on different occasions 
would be dependent. 

The Probability of Two or More Occurrences. The impor- 
tance of the ideas of dependence and independence among observa- 
tions is due to the probability relations in the possible outcomes 
of the observations. 

Consider again the survey of a community with the purpose of 
determining overerowding in homes. Suppose that a standard 
of comfort in the home has been adopted and that by this standard 
one third of the families in the community live in overcrowded 
conditions. Suppose also, that the community is fairly large. 
Then, if two families are observed independently, the probability 
that both live in overcrowded conditions is } x 4 = 4. In а sample 
of independent observations, the probability that all or even a 
considerable proportion of families selected independently for ob- 
servation live in overcrowded conditions is very small. The 
probability is high that in such a sample the proportion of families 
living in overcrowded homes will not differ greatly from 4. 

If, however, one observation is made by choosing a family at 
random and then a neighbor of this family is questioned to pro- 
vide a second observation, the probability that the second family 
will live in an overcrowded home is determined by the home con- 
ditions of the first family. If the first family lives in an over- 
crowded home then the chance that their neighbor will also live 
in such a home is likely to be far in excess of $. The joint proba- 
bility that both families live in overcrowded homes is then a 
number far greater than $. If an entire sample is chosen from the 
same neighborhood the proportion of overcrowded homes in the 
sample is likely to differ greatly from the corresponding propor- 
tion in the population. 

Consider a very extreme case in which one family is chosen at 
random and then all the other families in the sample are taken 
from the immediate neighborhood of the first one. The survey 
might very well indicate that all families in the community lived 
in substandard houses, or that all lived in affluence. It could not 
yield information of much value. 

It should be recognized that in some situations it is an excellent 


Exercise · 17 


procedure to choose many neighborhoods (say т) at random and 
then to make observations on a predetermined number of indi- 
viduals (say n) in each neighborhood. However, the results 
cannot be treated as a random sample of mn individuals, but must 
be treated by special methods. Moreover, the number of randomly 
selected neighborhoods must not be too small. 


EXERCISE 2.1 
1. Let X represent the result of throwing a die. All possible results 
may be grouped in six classes, each class defined by the number appearing 
on the exposed face of the die. Thus to say X = 5 is to state that the result 
falls in the class characterized by having the exposed number a 5. As 
these six classes are exhaüstive and mutually exclusive, the sum of the 
probabilities associated with them is 1. If the die is perfectly balanced, 
all classes have the same probability and so the probability of any one 
class (that is the probability of the appearance of any particular face) 
is & Translate the following statements into symbols. Мое: Slight 
variations from the form given in the answer key may be acceptable. 
а. The probability of obtaining a 518$. Ans. P(X = 5) = і 
b. The probability of obtaining a 1 is the same as the probability of 
obtaining a 2 ora 3... ога 6. 
с. The probability of obtaining a 1 ога 2is 4 +4=4 
d. The probability of obtaining an odd number is $+ $-- à — Ф 
е. The probability of obtaining a number smaller than 4 is 4. Note 
these definitions: 


X>A X is greater than A 

X«A X is less than A 

X2A X is greater than or equal to A, that is 
X is not less than A 

ХАГА X is less than or equal to A, that is 


X is not greater than A 
Ans. P(X < 4) = P(X =1 or 2 or 3) 
= P(X =1)+ P(X = 2)+ P(X = 3) 
-ititi-i 

f. The probability of not obtaining a 6 is 1 minus the probability of 

obtaining а 6, and this is 1 — 4 = $. 

2. Two perfectly balanced dice are thrown. Let X, represent the 
number exposed on the first and X; the number exposed on the second. 
"Translate the following symbols into words. 

а. Р(Х, = Запі X; = 3) = Р(Х, = 3) -Р(Х, = 3) = 1.1 =% 

b. P(X;- X22) P(X = 1) - P(X. = 1) 2 $- d 9 

с. Р(Х +X: = 8) = Р(Х, = "risum оге ст Ж 1) 


Siete eos = 


18 - Probability Distributions 


а вој 

еро = 6 | X: Bia 

f. P(X, and X; are both odd) = P(X, odd) - P(X, odd) 

= Р(Х, = lor30r5)- P(X: = 1 or 3 or 5) = G-i-08-i-5 
1 1 


- 


Eig 
Е. P(X: and X; both < 6) = Р(Х, < 6) - Р(Х, < б) 
= 1-0-0 =8-4=% 
в. P(X, + X; = Ш) = P(X, = 6 and X; = 5) + P(X, = 5 and X; = 6) 
= P(X = 6) -Р(Х, = 5) + Р(Х, = 5) PX; = 6) 
T 
8 


el ај 


аъ 
Factorial. It is convenient to have a term and a symbol to 
represent the product of consecutive integers beginning with 1. 
Thus 1-2-3-4-5-6-7 = 5040 is called factorial 7 or 7 factorial. 
It is denoted by the symbol 7! In general if N is a positive integer, 


(2.1) N!=N(N-1)(N -2)-. .4.3.2.1 


It is convenient to define factorial zero as equal to 1. This defini- 
tion simplifies notation and is consistent with other notation of 
factorials. 


The symbol (2 is used to represent a relation which occurs 


often in work with probability, especially in developing the bi- 
nomial distribution, below. It is defined as 


ea (%)- ШҮ Л 


This relation is of interest because it gives the number of ways in 
which r objects can be selected from a group of № objects. 


EXERCISE 2.2 
1. Verify the following relations: 


а. 41=4.3.2.1= 24 g. 8! = 8(7!) 
b. 51—5.4.3.2.1— 120 В. 20! = 20(19) (18!) 
51 _5.4.3.2.1_. . 20! 
р эргүү? i, та! = 380 
а = 72 1 б OD 
„811 к. & = 9(8)(7) 
° 10! 90 1, 20! 210 
! * 6^ 
5 2121 68 6141 


A Population of Two Classes · 19 


2. Find the values of the following expressions: 


N: _ 19! е. 013! 5n 
"iat * 20! ғ 16:15. (140) 8- oui 

p; 281 а 10! i 15! 101614! 
* 73! өп! * 7812! 


3. Verify the following: 


oe 8\ _ 29) _ 
a. (3) = 56 d. М) = 1 8. (29) et 

TAN M o [2 р 
b. ( E - 126 e. ( | 220 ь. (5) = 220 
5 m = 20 + ien = 2300 i 9 = 4368 


A Population of Two Classes. Perhaps the simplest popula- 
tion which can be considered is one which consists of only two 
classes. Such a population was described in Chapter 1 in the 
experiment to judge the ability to select the more intelligent of 
two persons by comparing their photographs. The population 
considered there consists of two classes: 1. the class of correct 
selections, and 2. the class of incorrect selections. A population or 
sample divided into two classes is often termed a dichotomy. 

The hypothesis formulated in Chapter 1 that a correct selec- 
tion was as likely to occur as an incorrect may be formulated in 
the language of this present chapter by saying that the propor- 
tion of elements in each class is one half. 

A different hypothesis as to the number of correct solutions 
would provide a different description of the population. Con- 
sider, for example, the hypothesis that a correct selection is four 
times as likely to occur as an incorrect selection. The class of 
correct selections would then contain 2 of the elements, whereas 
the class of incorrect selections would contain + of the elements. 
The two populations described by these two hypotheses are dis- 
played graphically in Figures 2-1 and 2-2. The proportion of 
correct selections may be any number between zero and one 
depending on the circumstances. The proportion of incorrect 
selections is one minus the proportion of correct selections. 

Populations consisting of two classes occur very often, for 
example: male and female, married and unmarried, passed and 
failed. Any population of two classes is fully determined by the 
proportion of elements in either one of the classes. For con- 
venience in writing formulas the symbol P will be introduced to 


20. Probability Distributions 


signify the proportion of elements in one of the classes of a two- 
class population. The proportion in the other class is 1 — P. For 
simplicity in writing it is customary to use the symbol Q for 
1- P. 


Incorrect Correct Incorrect Correct 
Selections Selections Selections Selections 
Ета. 2-1. Population distribu- Fic. 2-2. Population distribu- 
tion under the hypothesis that cor- tion under the hypothesis that cor- 
rect and incorrect selections are rect selections are four times as 


equally likely. likely as incorrect. 


While P may vary from population to population it is a fixed 
number for any one population. The proportion in a sample 
should be distinguished from that in the population. Thus if a 
population of persons contains 50% men, a sample, even though 
random, might contain 40%, or 70%, or even 100%, men. If the 
sample is random, and not very small, the probability of extreme 
deviation from 50% men is small. 

To distinguish between the proportion in the population, which 
we have called P and the proportion in a sample, we shall use a 
lower case p to denote the latter. 

Parameter and Statistic. A quantity like P which distin- 
guishes one population from another similar population is called 
а parameter. The reader should.note that P stands for proportion 
and not for parameter. Other parameters such as the mean and 
standard deviation will be introduced later. Many populations 
involve more than one parameter. The sample proportion p is 
called a statistic. It varies from sample to sample of any given 
population. Thus for random samples out of a population with 
parameter P, the sample statistic p fluctuates around P. 


The Binomial Distribution · 21 


The Binomial Distribution. The binomial distribution is the 
sampling distribution of p for random samples out of a two-class 
population. This statement means that the terms of the bi- 
nomial distribution provide the probability for each possible 
value that p may take in a random sample of N elements. 

To make the discussion concrete, let us return to the experi- 
ment for selecting the more intelligent of two persons by a com- 
parison of their photographs. Suppose that in the population the 
probability of a correct solution is P and hence the probability 
of an incorrect solution is 9. Suppose a random sample of two 
pairs of photographs is taken. Since the two pairs are independent 
observations, the multiplication law for combining probabilities 
applies. The four possible outcomes and the probability asso- 
ciated with each are: 


Outcome First selection Second selection Probability 
1 Correct Correct PXP=P 
2 Correct Incorrect PXQ=PQ 
3 Incorrect Correct QxP=PQ 
4 Incorrect Incorrect QxQ-2Q 


These four outcomes might be described in terms of the numbers 
of correct selections, which are respectively 2, 1, 1, and 0. They 
might also be described in terms of the proportions of selections 
made correctly, which are respectively 2 = 1.00, $ = .50, $ = .50, and 
$=0. In either case the two middle outcomes may be grouped 
together, as they yield the same number of correct selections. 
The sampling distribution for samples of 2 cases is then: 


Np p Probability 

Number of Proportion of 
correct е 

3 correct selections 
selections 

2 1.00 Pi 

1 .50 2PQ 

0 .00 Q 


The sum of the probabilities is Q? + 2PQ + P? which is 

(Q+P)? = (1)? = 1. 
For samples of any number of pairs the sampling distribution may 
be developed in similar fashion, but the task is a very tedious one 


unless N is small. The student may find it instructive to work 
out the distribution for samples of 3. The following presentation 


S.C.E.R.T., West Bengal 
Date. 16...33... & и. voe see. 


gum MA YOM 


22 - Probability Distributions 


shows probabilities for samples of 5 pairs. Generalization from 
5 to other values of N can be facilitated by writing the numerical 


coefficients in the (a) notation, and that has been done in the 


right-hand column. 


Np p Probability 
Number of correct Proportion of correct 
selections in selections in samples 
samples of 5 of 5 
E 
5 1.00 Р» = () р» 
4 80 БОР = () QP! 
3 10Q?P* = (5) QP 
2 40 10Q°P? = (5) QP? 
1 30 БОР = 3 ФР 
5 = 5\ os 
0 00 Q ( 20 


The sum of the six probabilities is (Q + P)’ = (1) = 1. Here 
we have shown the binomial sampling distribution for samples of 5. 
To generalize, call the elements in one of the two population 
classes ‘‘successes” and let P be the probability of a success. 
Then for a random sample of N cases the probability of exactly 
T successes is 


(2.3) (5) рф" 


This generalization leads to the sampling distribution of р for 
samples of N cases from a two-class population, which is given in 
Table 2.1. 

The probabilities in the informal table previously given for 
samples of 5, and the probabilities in Table 2.1 for samples of any 
size apply either to the proportion of successes or to the number 
of successes. 

The fact that the sum of probabilities in the right column is 
(Q + P)“ is of interest since it illustrates the fact that the bi- 
nomial distribution is a special case of the binomial expansion 
which is treated in textbooks on algebra. Because Q + P is always 


Graphic Representation of Binomial Distribution - 23 


TABLE 2.1 The Binomial Distribution 


Sampling distribution for p from samples of N 
cases from a two-class population 


Np p Probability 
nA 
Jews MIT 
Е: en orsi) 568 
-— N= 2 ud ) pras 
IUE A Luis 
wir MM 
E UM Bre 
о у (0° 
Toran (Q3 P) =1 


equal to 1, (Q + Р)” = 1 no matter what value № has. Inasmuch 
as the classes defined by the number of successes are mutually 
exclusive and exhaustive, the sum of their probabilities must 
be 1.00. 

The Graphic Representation of the Binomial Distribution. 
Figures 2-1 and 2-2 on page 20 showed two populations in which 
Р = ‚5 and Р =.8 respectively. Suppose a sample of 5 cases is 
drawn from each of these populations. The number of successes 
which can be obtained in that sample can only be one of the in- 
tegers, 0, 1, 2, 3, 4, or 5 and so the proportion of successes can 
only be 2, 4, 2, 3, 4, or $, as shown in Figure 2-3. These values 


@--------@-------- S-a -#-------- -8-------- 2 
0 2 4 6 8 1.0 


Ета. 2-3. Possible values of р in samples of 5 cases. 
are represented as discrete points on a linear scale. The variable 


exists only at these points. The probability associated with each 
value of p for samples of 5 from a population in which P = .5 


24 - Probability Distributions 


is given numerically in column 3 of Table 2.2 and shown by a 
vertical line in Figure 2-4. These probabilities are the respective 
terms of (.5 + .5)°. The corresponding probabilities when P = .8, 
obtained from the terms of (.2--.8)*, are given numerically in 
column 5 of Table 2.2 and shown graphically in Figure 2-5. 


Probability Probability 

1.0 1.0 

9 :9 

8 8 

Fri 7 

6 6 

5 5 

4 4 

3 3 

2 2 

1 1 

£ ЭК ДК Б. В. 10 2 ЖОЛ Аб, B L9 

Scale of p Scale of p 


Fia. 2-5. 


Fie. 2-4. Distribution of p in 
samples of 5 from a two-class popula- 
tion in which P — .5, p being repre- 
sented as a discrete variable. 


Distribution of p in 
samples of 5 from a two-class popula- 
tion in which P = .8, p being repre- 
sented as a discrete variable. 


Even in elementary work in statistics it often is found con- 
venient to treat a discrete variable as though it were continuous. 
Thus the mean number of children in a group of families might 
be reported as X = 1.53, or the 90th percentile reported as 3.87. 


TABLE 2.2 Probability Associated with each Possible Value of p in Samples 
of 5 when P = .5 and when P = .8, and Corresponding Cumulative Probability 


Wage Seek z= (ETT a о 
of Np of p Probability probability Probability probability 

1 2 3 4 5 6 

5 1.0 .03125 1.00000 .32768 1.00000 

4 8 .15625 .96875 .40960 .67232 

3 6 31250 81250 20480 26272 

2 A 31250 -50000 .05120 .05792 

1 2 15625 .18750 .00640 00672 

0 0 03125 03125 .00032 .00032 

1.00000 1.00000 


These numbers are obviously artifaets but they are meaningful 
artifacts. No family has 1.53 children but the statistic gives useful 


Graphic Representation of Binomial Distribution - 25 


information about the families under consideration. The distri- 
bution of the number of children per family might be represented 
graphically as a histogram where the numbers 0, 1, 2, . . . in- 
dicating number of children per family are represented not as 
discrete points on a line but as midpoints of intervals. In such 
a figure the number of families with 2 children would be represented 
by the area of a rectangle with base extending from 1.5 to 2.5 on 
the horizontal axis and with height proportional to the number of 
families having 2 children. The number 0 on the horizontal scale 
would be represented by an interval extending from — .5 to + .5 
even though no families could have a negative number of children. 
All intervals being of the same width the areas of the rectangles 
which form the histogram are proportional to their heights and 
to the frequencies represented. 

The binomial probability distribution may also be represented 
by a histogram which, although an artifact, brings out certain 
general concepts better than the discrete distribution. Figures 
2-6 and 2-7 are such histograms superimposed on the graphs of 
the discrete distributions of Figures 2-4 and 2-5. In these figures 
probability is represented by area. The ordinate is then called 
the probability density. 


Probability Probability 

Density Density 

1.0)- 10-т-- 

.9 ‚9 

8 8 

7 17, 

6 6 

5 Б, 

4 4 

3 Ae) 

e ua 

cf л 

0- 42! 54: 156148 010 012. 2166 «81-400 
Scale of p Scale of p 


Fic. 2-6. Distribution of p in Fie. 2-7. Distribution of p in 
samples of 5 from a two-class popu- samples of 5 from a two-class popula- 
lation in which P = .5, p being rep- tion in which P = .8, p being repre- 
resented as a continuous variable. sented as a continuous variable. 


Suppose we wish to represent the probability that p = .4 or 
less when P = .5. By the Addition Law stated on page 15 this is 


P(p = 0) + P(p = .2) + P(p = 4) = .03125 + .15625 + .31250 = .50 


26 - Probability Distributions 


as indicated by the cumulative probability in column 4 of Table 
2.2. This probability is the sum of the three vertical lines at 
p=0, p=.2, and p = .4 in Figure 2-4 but that sum is not par- 
ticularly easy to visualize. The same probability is represented 
by the area to the left of p = .45 in Figure 2-6. This area is easily 
visualized but its numerical value cannot be directly read from 


Probability Probability 
1.0 10r4-— 
9 9 
8 8 
7 7 
6 6 
5 5 
4 A 
3 3 
2 2 
т 1 
0 0 
ОЕ К om (ЕТО GIA AGN: 1:0 


Scale of p 


Scale of p 


Fic. 2-8. Cumulative probability Ета. 2-9. Cumulative probability 
distribution of p in samples of 5 from distribution of p in samples of 5 from 
a two-class population in which а two-class population in which P = .8. 
Р = 5. 


the scale. Figure 2—8 represents the same cumulative probability 
by the length of the vertical line at p = .4. This is readily visu- 
alized and its value can be read directly from the scale. The 
cumulative probability curve has great advantages and will be 
used frequently throughout this book. The student should study 
(1) the way in which Figures 2-8 and 2-9 are plotted from the 
cumulative probabilities in Table 2.2, (2) the relation of Figures 
2-8 and 2-4; (3) the relation of Figures 2-9 and 2-5 and (4) the 
difference between Figures 2-8 and 2-9, 


EXERCISE 2.3 

1. If samples of 4 cases each are drawn from a very large population 
in which 30% of the individuals have a certain trait, what is the sampling 
distribution of the number having that trait? Make a graph of the popula- 
tion distribution. Make a graph of the sampling distribution. 

2a. If samples of 3 cases each are drawn from a very large population 
in which P = .50, what is the sampling distribution of p? Make a graph ol 
the population distribution; of the sampling distribution. 

b. Do the same for samples of 6 cases. 

c. Do the same for samples of 10 cases. 


Mean of a Probability Distribution - 27 


3. In drawing samples from a symmetric population in which P = Q 
= .50, what appears to be the effect of increasing the size of the samples? 
Base your answer on the results obtained in question 2. 

4a. If samples of 3 cases each are obtained from a very large population 
in which P = .60, what is the sampling distribution of p? Make a graph of 
the population distribution; of the sampling distribution. 

b. Do the same for samples of 6 cases. 

c. Do the same for samples of 10 cases. 

5. Answer the questions in item 4 for the case in which P = .2. 


Mean of the Probability Distribution for a Discrete Variable. 
The familiar procedure of computing a mean will now be applied 
to a probability distribution in order to develop certain important 
new concepts. In Table 2.3, X is the discrete variable “ propor- 
tion of successes in a sample of 5 cases." Instead of the frequencies 
you are accustomed to using in such computations, we here use 
the probabilities from column 3 of Table 2.2. The mean thus 
computed is found to be .5 which was the value of the population 
proportion. This mean is called the expected value of p or the ex- 
pectation of р and is denoted by the symbol E(p), read “Е of р” 
or "expectation of p." 


TABLE 2.3 Computation of the Mean and Variance of p in Samples 
of 5 from a Population in which P = .5 


X — Value Probability (Probability) X (Probability) X? 
of p [Analogous to f] [Analogous to fX] [Analogous to fX*] 
1.0 .03125 .03125 .03125 

8 .15625 .12500 .10000 
6 31250 18750 11250 
A 31250 .12500 .05000 
2 .15625 .03125 .00625 
0 -03125 .00000 .00000 
Sum 1.00000 -50000 * 80000 
. Z(Probability)X _ .5 _ 
Mean = probabilities — 1.0 — 5 

д _ Z(Probability) X* _ И 09 (5 iz X 

VOR Seaman O 19) $3 


Standard deviation = У .05 = .224 


As we shall wish to use the idea of expected value in reference 
to other statistics as well as p, let us now state it in more general 
terms. Let X denote the variable for which a probability dis- 
tribution is given, and let Pi, P» . . . Р, be the probabilities of 


28 - Probability Distributions 


the values Xi, X: .. . Xy. Then the mean of the probability 
distribution of X is Е(Х) and is computed as 


since =P; = 1. ХРиз an abbreviation for the sum of the Р,” A 
fuller description of this symbol is given in Chapter 5. 


(2.4) E(X) = Р.Х; 


The parenthesis around X must not be interpreted as indicating 
a product. The term expected value originated in games of chance. 
If in such a game there are k different possible outcomes with 
probabilities Pi, P»... P, and if the amount received by а 
player for each of these outcomes is Xi, Xs . . . X; (where some 
of the X’s may be negative if forfeits are involved) then in a long 
series of games Е(Х) as given by Formula (2.4) is the player’s 
expected return per game. The small Greek letter и (pronounced 
mu) corresponding to the English letter m will be used as an 
abbreviation for the mean of a probability distribution. 


(2.5) их = E(X) 
For the probability distribution of p, 
(2.6) lp = Е(р) = Р 


Variance and Standard Deviation of the Probability Distribu- 
tion for a Discrete Variable. In Table 2.3 the variance and stand- 
ard deviation of p have been computed in the manner presumably 
familiar to you with probabilities taking the place of frequencies.* 

The computation might also have been carried out by sub- 
tracting P from each value of p in turn, squaring, multiplying 
by the probability and summing, thus: 


p-P (p — Р)? (p — Р)? (Probability) 

10-.5= 65 25 .0078125 
8—.5= 3 09 .0140625 
бун 1 .01 .0031250 
4-.5=-.1 01 .0031250 
2- .5=— 3 .09 .0140625 
0—.5-—.5 .25 .0078125 
.0500000 


* Persons for whom this is a first course may find it helpful to study the computa- 
tion of mean, variance, and standard deviation in an elementary text, or to turn to 
page 115 of this text. 


Mean and Standard Deviation of Binomial - 29 


This is exactly the same result as obtained in Table 2.3. The 
variance of p will be called the expected value of (p — м»)? de- 
noted by the symbol c?, (c is the small Greek letter corresponding 
to s and is pronounced sigma). 


(2.7) c, = E(p — uj)? = E(p — Р)? 
In general if X is defined as before to be any variable for which 
the probability distribution is given, 
(2.8) E(X – џ) = ZP(X; – р)? = ХР;Х°; - p’ 
(2.9) ах = E(X – џи)? 
The standard deviation ex is the square root of ће variance ох. 


EXERCISE 2.4 

For each of the following probability distributions compute и = E(X) 
ando? = E(X — uy. 

1. The distribution in Table 1.1 on page 4 (и = 1, e? = 1). 

2. The distribution of p in samples of 5 from a population in which 
P = 8, using the probabilities given in column 5 of Table 2.2. 

(и = .8, o? = .032.) 

3. The distribution of the number of successes (X = Np) in samples 
of 5 from a population for which Р=.5. Use the probabilities from 
Table 2.2. (и = 2.5, 0? = 1.25.) 

4. The distribution of the number of successes (X = Np) in samples 
of 5 from a population for which P = .8. (и = 4,0° = .8.) 

5. The distribution in Table 3.1 on page 45. (и = .5, e? = .025.) 


Mean and Standard Deviation of a Sample. The notation 
and formulas used in the preceding two sections are applicable 
to the calculation of means and standard deviations of theoretical 
distributions. Other notation and formulas will be used in the 
calculation of these characteristics in samples. A discussion of 
these is deferred to Chapter 5. 

Mean and Standard Deviation of the Binomial Distribution. 
In the preceding sections we dealt in general with the mean, 
variance, and standard deviation of any probability distribution. 
However when the probability distribution has the binomial form 
the computation can be greatly simplified. Let X represent the 
number of individuals having a given trait in a sample of N in- 
dividuals from a population in which P is the proportion having 
that trait. Then the mean is 


(2.10) их = Е(Х) = NP, 


30 - Probability Distributions 


the variance is 


(2.11) cx = E(X – и)? = NPQ 
and the standard deviation is 
(2.12) ox = VE(X – и) = VNPQ 


The reader may gain understanding of these formulas by apply- 
ing them in problems 3 and 4 of Exercise 2.5 where they should 
give results identical with those obtained by the longer method 
of computation. 

Let p represent the proportion of individuals having a given 
trait in a sample of N individuals from a population in which B 
is the proportion having that trait. The mean of the sampling 
distribution of p has already been given in Formula (2.6) as 


и, = E(p = P 
The variance is 


(2.13) o, -E(p - Py = 50 


and the standard deviation is 
(2.14) e, = МЕ(р -PP - (20 


The reader may gain understanding of these formulas by applying 
them in problems 2 and 5 of Exercise 2.4. 


EXERCISE 2.5 

1. Suppose a class which is studying sampling distributions decides 
that each member will spin each of 10 pennies once and record the number 
of heads which appear. Assume the pennies to be relatively free from bias 
so that the probability of a head appearing on one particular spin of one 
penny is Р = .5. 

a. What is the expected number of heads on each spin of 10 pennies? 

b. Will this expected number be affected by the number of repetitions 
of the experiment? Will it be related to the number of persons in the class? 

с. Will this expected number be changed if each person spins 20 instead 
of 10 pennies at one time? 

d. Suppose 500 students carry out the experiment and each records 
as an observation the-number-of-heads-on-one-throw-of-10-pennies. Imag- 
ine the frequency distribution of these observations. An observation may 
have any value from 0 to 10. Can you state approximately the value of 
the variance of these observations? Why would this result be only an 
approximation? Would the approximation be better or worse if 10,000 


Approximation to the Binomial Distribution - 31 


students report observations than if 500 do so (assuming of course that the 
10,000 observations are made with equal care)? 

e. If each student throws 20 pennies instead of 10 would that change 
the variance discussed in question 1d? 

2. Suppose 500 persons agree to take part in the experiment described 
in the preceding question (or a smaller number of persons make repeated 
throws until there are 500 records). 

a. What are the expected frequencies for the distribution of heads on 
the 500 throws? Let X = number of heads appearing on one throw. Then 
Р(Х) is the probability obtained by taking the appropriate term in the 
expansion of (.5 + .5)ю. The expected frequency for a given X; is 500P(X. i) 
and is the appropriate term in the expansion of 500(.5 + .5)". Complete 
the entries in columns F, FX, and FX?, 

P(X) F FX FX? 
.00098 .49 4.9 49 
.00977 4.88 43.9 395.1 
.04395 21.98 175.8 


© кою ©з c o -а оо © H 
E 
© 


1.00003 500.08 


b. Compute the mean and the variance for the distribution in question 
2a. Compare these answers with the values for pz and c; obtained by use 
of Formulas (2.10) and (2.12). 

c. Would the outcome of question 2a have been different if there had 
been 100 repetitions instead of 500? 

d. If the experiment were carried out as described could you be con- 
fident that the distribution of the number of heads in 500 replications 
would exactly conform to the frequency distribution of question 2а? Why? 


Approximation to the Binomial Distribution. The computa- 
tions made and the graphs drawn in Exercise 2.3 suggest the fol- 
lowing generalizations: 

a. If P = .5, the graph of the binomial distribution is sym- 
metrical regardless of the size of N. 

b. If P is not .5 the graph of the binomial distribution is 
asymmetrical, or skewed, when N is small but becomes more 
nearly symmetrical as N increases. 


32. Probability Distributions 


c. As N increases, the steps in the graph of the binomial dis- 
tribution become smaller and the graph takes on more the ap- 
pearance of a smooth curve. 

We may even go further and say that if P is not too near 0 
or 1 and if N is large, the smooth curve called the * normal curve” 
provides a very good approximation to the binomial. 'The sum 
of any number of terms of the binomial can then be obtained 
approximately from the corresponding area under the normal 
curve. The larger the value of N the better the approximation is. 
In fact, the first treatise ever written about the normal curve 
appeared under the title Approvimatio ad Summam. Terminorum 
Binomii a- b" in Seriem Eapansi. Being translated this is 
“An Approximation to the Sum of the Terms of the Binomial 
(а + 0)" Expanded in Series.” In this seven-page pamphlet 
written in Latin and dated 1733, Abraham De Moivre wrote 
“although the solution of problems of chance often require that 
several terms of the binomial (a + b)" be added together, never- 
theless in very high powers the thing appears so laborious, and of 
so great difficulty that few people have undertaken the task.” 
Then he proceeds to obtain the formula for the ordinate of the 
probability curve and for the proportion of area lying between 
ordinates at selected points. 

The Normal Distribution. The normal probability distribu- 
tion in the adjacent sketch is a probability distribution on a con- 
tinuous variable. The scale for values of this variable is located 
on the base line. For a continuous probability distribution one 

does not speak of a probability cor- 

responding to a particular value of 

the variable, but rather of the 

probability that a value of the 

А ОВ variable is less than a given num- 

ber or more than a given number, 

or that it occurs between two given numbers. The probability 

is then an area under the curve. The probability that a value 

of the variable is less than A is the area under the curve, above 

the base line, and to the left of the ordinate at А. The probability 

that a value of the variable is between A and B is the area under 

the curve, above the base line, and between the ordinates at A 

and B. Since the normal distribution is a probability distribution 
the total area under the curve is one. 

For the normal distribution the point 0, shown in the sketch, 


Reading a Table of Normal Probability - 33 


is at the same time the mean, median, and mode. It is most often 
referred to as the mean. For convenience in reading tables re- 
lated to the normal curve, the mean of the distribution is taken 
as the origin from which values of the variable are measured. 
Scale values are most conveniently expressed in standard devia- 
tion units. When the origin of a normal probability distribution 
is placed at the mean (и) and values are scaled in terms of the 
standard deviation (e), the distribution is called wnit normal. 
The unit normal distribution has mean zero and variance 1. In 
this book the letter z will be used to denote a value of a variable 
so distributed. 

If X is any normally distributed variable with mean и and 
standard deviation с, then 
A 

c 


(2.15) z= 


Tt is convenient to attach a subscript to the letter z to indicate 
what proportion of the area under the normal curve lies to the 
left of an ordinate drawn at z. In 
the adjacent sketch 30% of the UN 
area lies to the left of an ordinate 30% 
at 2 and 90% to the left of an а = ee 
ordinate at 2.0; therefore 230 is the Е : 
30th percentile of the curve and zs is the 90th percentile. 

Reading a Table of Normal Probability. Tables I, П and ПТ 
in the Appendix may be used for obtaining the probabilities of 
the normal distribution. 

In Table I are shown area values corresponding to given 
abscissa values ranging from — 4 to +4. Points to the left of 
the origin have negative abscissas and these are listed in the first 
column under the heading z,. Then а denotes the area to the left 
of an ordinate at z,. Points to the right of the origin have positive 
abscissas and these are listed in the second column under the 
heading 21_„. The area to the left of an ordinate at 21. 31-а 
and the area to the right is а. Thus 2, = — 21-а. The area out- 
side the ordinates at 2, and Zi-a is 20, the area between these two 
ordinates is 1 — 2æ. Thus the table indicates that the area to 
the left of an ordinate at z = —.9 is .200. This means that the 
20th percentile of the normal curve is — .9. This table is conven- 
. ient to use when a value of z is known and the corresponding 
probability value is required. 


34 - Probability Distributions 


Verify the following statements by identifying the appropriate 
entry in Table I. 


P(z« – 2.5) = .006 P(e< – 2.1) = .018 
Р(2< 2.5) = .994 Р@< 2.1) =.982 
Р(2< – 4) = .345 Р(2< — 2.6) = .005 
Р(2< А) = .655 Р(2< 2.6) = :995 


The percentile rank of z = — 2.5 is .6 
The percentile rank of z = .4 is 65.5 


2 = 2.1 is the 98th percentile. 
z = — 2.1 is the 2d percentile. 


In Table II the roles of the two columns in Table I are reversed, 
making Table П particularly convenient to use when a probability 
value is known and the corresponding value of z is required. 
Thus, if we want to know the 30th percentile of the unit normal 
curve we want a value of z such that 30% of the area lies to the 
left of the ordinate at that point, and this is read directly from 
the table as 230 = —.524. If we had tried to obtain this same 
value from Table I we could only have noted that .30 which is 
the given area is between the tabulated values .308 and .274 and 
that therefore the required value of z is between — .5 and - .6. 
Interpolation is not linear in these tables and could not be relied 
upon for more than one additional digit. It is much less laborious 
and more accurate to use Table II than to try to interpolate in 
Table I and vice versa. Verify the following statements by iden- 
tifying the appropriate entry in Table II, then look at Table I 
to see how nearly you could have estimated the same value 
from it. 

For the unit normal curve 


the 10th percentile is — 1.282 
the 44th percentile is — .151 
the 50th percentile is 0 

the 95th percentile is 1.645 


Table III is set up in a fashion similar to that of Table I but 
with these modifications. The z column has been headed c/c 
and entries are at intervals of .01 instead of .1. It is to be under- 
stood that z/c has here the same meaning as г, in Table I. Areas 
are expressed as areas between the ordinate at z and the ordinate 
at the mean instead of areas to the left of the ordinate at 2. 


Reading a Table of Normal Probability - 35 


Thus for z = — 1.20 Table I would give the area as .115 and Table 
III would give it as .5 – .3849 = .1151. For 2 = – .65, inter- 
polation in Table I would give the area as 3(.274 + .242) = .258 
while Table III would give .5 — .2422 = .2578. To estimate the 
same value from Table II we could note that the tabulator entry 
nearest to — .65 is — .6745 and the corresponding area is .25. 

The reader should be warned that practice in texts is not uni- 
form as to what areas of the normal curve are tabulated, and so 
a full understanding of the relations involved is necessary if one 
is to make correct use of whatever tables are at hand. In most 
tables the maximum ordinate is given as .3989 as in Table III, 
but in some tables the maximum ordinate is given as 1.00 and 
all other ordinates are expressed as proportions of the maximum. 
Such a table is a table of the normal curve but not of the unit 
normal. 

It is important that the reader understand certain relation- 
ships, which will be briefly stated here. Exercise 2.6 provides 
practice in recognizing these. 

The subscript of z denotes an area while z denotes a distance 
on the baseline measured from the mean, or it denotes a point on 
the baseline at distance z from the mean. 


If the subscript is less than .50, z is negative. 
If the subscript is exactly .50, 2 is zero. 
If the subscript is greater than .50, 2 is positive. 


In subsequent chapters the small Greek letter o (alpha) will 
often be used to designate a small area in one of the tails of a 
probability curve, as in the adjacent 
sketch. Then 

Za = — 21-а 
= 2, = Zi-a T4 ps 
2, а-а = 0 Za Zia 


The reader must be warned that the numerical subscript is 
used with a somewhat different meaning in most other texts. In 
later chapters we shall deal with the probability distributions of 
several other variables, in particular with variables called $, x?, 
and F. In the symbolism used in this text (which is the same as 
that used by Dixon and Massey in An Introduction to Statistical 
Analysis, McGraw-Hill, 1951), t.o, Х?. and F.o each means the 
fifth percentile of the corresponding probability distribution. In 


36 - Probability Distributions 


the symbolism commonly used in other texts, % is the 97.5 per- 
centile, x%s and Fos are the 95th percentiles, while the 5th per- 
centiles of those distributions are denoted as — tio, x^s and Fw. 
Obviously the notation introduced by Dixon and Massey and 
followed in this text is much more consistent and less confusing. 


EXERCISE 2.6 

1. By reference to Tables I, II, or III verify each of the following 
statements and indicate the number of the table which can be most con- 
veniently used. 


a. P(z > 2.05) = .5 — .4798 = .0202 (Ш) 
b. P(e < —.2) = 421 (1) 
с. Р(0 < г < 1.32) = .4066 (Ш) 
d. P(— .64 < z < 0) = .2389 (Ш) 
e. P(z < .5) = .692 (Т) 
f. Р(г < .52) = .5 + .1985 = .6985 (III) 
g. P(z > .57) = .5 — 2157 = .2843 (III) 
h. P(-.8 < z < 3) = 1 — 2(.382) = .236 (1) 
= 2(.1179) = .2858 (III) 
i P(e«—.30rz».3)-2(382) = .764 (I) 
= 1 — 2(.1179) = .7642 (III) 
j- Рё < — 2.5 or z > 2.5) = 2(.006) = .012 (1) 
= 1 — 2(.4938) = .0124 (III) 
k. ОБ 5 1.645 (1I) 
1. Zos = — 2.0 = 2.054 (П) 
m. 2.01 = — 3.09 (LI) 
D. 2. = — 200 = 2.878 (11) 
2. Verify the following statements and translate them into words: 
a. 2.30 = — 20 f. га 2 = 0 
b. 212 = — 2.8 g. 28 20 = 0 
С. 2.90 = — 210 h. 25 = — 2.75 
d. 25 = — 2.05 1. 2.995 — 2,005 
е. 220+ 2.80 = 0 j. 203+ 2.97 = 0 
3. Without reference to any table, state the value of the following: 
a. P(z < гь) i P(z > гм) 
b. Plz > zs) j P(e > 2.18) 
с. P(z < 2.9) К. P(e < zw or > 2.95) 
d. P(z < 2 < za) 1. P(z < ге) 
e. P(zo < 2) m. P(z > 25) 
f. P(za < 2.99) п. P(z < zw or 2 > 2.999) 
g. P(z > 2) о. P(z < zw or 2 > zs) 
h, Р(20 € 2 < za) 


Computing Binomial Probabilities - 37 


4. If A represents some particular number find the value of A for which 
each of the following statements is satisfied. 


а. P(g > А) = .15 f. PCA << А) = .95 

b. P(e < А) = .04 g. Р(2> А) +Р(2< — А) = .01 
с. PC A <z < А) = .30 h. P(e > А) = .995 

d. Pg >A)+P(e<—A)=.50 i. P(e < А) = .975 

е, J 


. P(z > A) = .05 Р(-А <z < А) = .50 


Computing Binomial Probabilities by Use of Normal Proba- 
bility Tables. Since the normal distribution approximates closely 
the binomial, the table of areas under the normal curve may be 
used to compute probabilities of the binomial distribution. 

Consider the following problem. In 100 fair throws of an un- 
biased coin what is the probability that heads will be exposed 
60 times or more? 

This is a sample of 100 from a population of two classes for 
which P =Q =.5. The exact probability distribution is given by 
the terms of the binomial for which М = 100 and Р = .5. To 
calculate exactly the probability called for would require adding 
the terms 


LOON үл? 100\ /1\' 100\ /т\!® 100Y /1\'° 
(100) 8)" + (во) 8) +++ -+ CS) + ( IO 
To avoid this formidable task, the normal curve may be used to 


obtain an approximation — and a very good approximation. This 
approximation is accomplished by means of the statistic 


(2.16) z = —— 


Notice that this statistic is zero at the mean of the distribution 
when p= P; it has unit standard deviation because of its de- 
nominator, and its distribution is approximately normal. Con- 
sequently it can be treated like the z of the unit normal distribution. 

Returning now to the problem stated at the beginning of this 
section, we have P = .5, p = .6, and М = 100, hence 


20:5 


„эше д 
үсе 
100 


Table I shows that P(z > 2) = .023. Hence the chance that more 
than 60 throws will show heads is .023. 


38 - Probability Distributions 


Consider also the following example: A mental test of 80 
multiple-choice items each having 5 alternatives is presented to 
a subject who knows nothing whatever about the topic and guesses 
the answers to all of the questions. What is the chance that he 
will answer 24 or more of the questions correctly? 


Here Р = += .2, р = % = 3, and М = 80, so 


The chance that the subject will answer more than 24 questions 
correctly by guessing is P(z > 2.24) = .0125 by Table III. 

The Closeness of Approximation of the Normal Curve to the 
Binomial. In order to see how well the normal curve fits a binomial 


Probability 


0 1 2 3 4 5 6 7 8 9 10 


Fic. 2-10. Binomial probability distribution with Р = .5 and 
N = 10 and normal distribution with mean = PN = 5 and standard 


deviation = V NPQ = 1.58. 


distribution we may consider a small sample in which actual 
probabilities can be computed and compared. Consider the 
probability distribution of X in a sample of 10 
cases where X runs from 0 to 10 with probabilities x Р(Х) 
given by the binomial distribution for which P = .5 
and № = 10. X is a discrete variable with proba- 
bilities as given in the adjacent list. For the 
sake of comparison of binomial and normal dis- 
tributions, X is represented in Figure 2-10 as a 
continuous variable. Thus instead of the point 
Х = 6 we have the interval 5.5 < X « 6.5, and in- 
stead of X = 0 we have the interval — .5 < X «ub. 

In such comparisons it is more effective to use 
cumulative distributions. The cumulative binomial 


CH кю ел а-а: о 5 
E 
Iz 


Closeness of Normal Curve to the Binomial · 39 


probability that X will be no greater than a given value is shown 
in the second column of Table 2.4. Thus 


P(X = 0 or 1 or 2 or 3) = Р(Х < 3.5) 
.001 + .010 + .044 + .117 = .172 


To obtain the corresponding probabilities on the assumption that 
X is normally distributed we must first find the mean and stand- 
ard deviation of the binomial distribution. By Formula (2.10), 
и = 5 and by Formula (2.12) о = V2.5 = 1.58. We must next 
find the z value corresponding to each division point (such as 
.5, 1.5, 2.5, etc.) between intervals. For example, if 

1.5-5 


X = 1.5, 2 = -izg = – 2.21. 


Ш 


These z values are entered in column 3 of Table 2.4. The normal 
probability, that is, the area to the left of the ordinate at 2, is 
shown in the final column of this table. 

The two cumulative probability distributions tabulated in 
Table 2.4 are shown graphically in Figure 2-11. The values of 
X listed in Table 2.4 are the midpoints of intervals in the graph. 
At these points the smooth curve and the step curve are very 


Cumulative 
Probability 
10,—— 


0 1 2 3 4 5 6 7 8 9 10 х 


Fic. 2-11. Cumulative binomial probability distribution with P = .5 
and N = 10 and cumulative normal distribution with mean = PN = 5, and 


standard deviation = V NPQ = 1.58. 


40 - Probability Distributions 


close together, indicating that the smooth curve provides a good 
approximation to the step curve. 


TABLE 2.4 Cumulative Probability for the Binomial Distribution with P = .5 
and N = 10 and for the Normal Cistribution with м = 5 and с = 1.58 
Хо = point Midway between two Adjacent Values of Np 


Cumulative Reels Cumulative 
Xo binomial z= Т 55 normal 
probability ч probability 
9.5 :999 2.85 .998 
8.5 .989 2.21 986 
7.5 .945 1.58 .943 
6.5 828 95 829 
5.5 .623 82 .625 
4.5 371 — 32 375 
3.5 472 — 95 171 
2.5 055 — 1.58 057 
1.5 011 — 2.21 014 
5 :001 — 2.85 .002 


By way of contrast, the same treatment has been applied to 
the binomial with P = .8 and М = 10. The probability histogram 
and superimposed normal curve are shown in Figure 2-12, the 
cumulative probabilities are listed in Table 2.5, and cumulative 
probability curves drawn in Figure 2-13. Examination of Figure 
2-12 reveals that the normal curve is symmetrical while the histo- 


TABLE 2.5 Cumulative Probability for the Binomial Distribution with P = .8 
and М = 10 and for the Normal Distribution with и = 8 and g = 1.26 
Xo = point Midway between two Adjacent Values of Np 


Cumulative X,—8 Cumulative 
Xo binomial z= “1065 normal 

probability м probability 
9.5 .892 1.19 .883 
8.5 .624 40 .655 ‚ 
1.5 822 — 40 .345 
6.5 .121 — 1.19 117 
5.5 .033 — 1.98 .024 
4.5 .007 — 2.77 .003 
3.5 .001 — 3.56 .000 
2.5 .000 — 435 000 
1.5 000 — 5.14 .000 


5 000 — 5.98 000 


ee 


Exercise - 41 


Probability 


2 3 4 5 6 7 8 9 10 11 12 


Fra. 2-12. Binomial probability distribution with P = .8 and М = 10 and 
normal distribution with mean = PN = 8 and standard deviation = VNPQ 
= 1.26. 


gram isnot. Examination of Figure 2-13 reveals that at the mid- 
points of intervals the smooth curve and the step curve do not 
coincide as well as they did in Figure 2-11. Even here, however, 
the discrepancy is less than many people would have expected. 


Cumulative 
Probability 
10,-—-———T—-—r-— 


шы oO У о i 


hh 


0 1 2 3 4 5 6 i 8 9 10 X 

Fic. 2-13. Cumulative binomial probability distribution with P = 8 
and N = 10 and cumulative normal distribution with mean = PN = 8, and 
standard deviation = V NPQ = 1.26. 


REVIEW EXERCISE 

1. This chapter has introduced a number of technical terms which 
are new to most readers. To phrase a completely satisfactory formal 
definition of some of these terms would require more erudition than is to 


42 - Probability Distributions 


be expected of a student at this stage of development. However you should 
have a fairly clear idea about the meaning of the words and phrases in 
the following list: 


additive law for probabilities parameter 

binomial distribution population 

continuous variable population distribution 
cumulative probability probability density 
dichotomy probability distribution 
dichotomous population probability of a class 
discrete variable random observation 
exhaustive classes random sample 
expectation random selection 
expected value random variable 
factorial sampling distribution 
independent observations statistic 

multiplication law for probabilities step curve 

mutually exclusive classes unit normal curve 
normal curve universe 

observations variable 


2. Certain technical terms have been used in this chapter without 
explanation because the reader is presumed to be familiar with them. If 
any of these are not clear, you should look them up in an elementary text 
in statistics: 


abscissa mode 

histogram ordinate 

mean standard deviation 
median variance 


3. Certain symbols introduced in this chapter will be used throughout 
the text. If their meaning is not clear at this point you should review 
them and rework those exercises which make use of them. Such symbols 


N 
are: (9 61, р, P, Q, u, 0, Z, г, E(X), P(X = 5), P(X is in C; = Pi) 


3 Inferences Concerning Proportions 


The general logic of statistical inference was outlined 
somewhat sketchily in Chapter 1, and was there applied to some 
very simple problems in which the statistic computed was either 
the number of observations having a certain characteristic, or the 
proportion of observations having that characteristic. The simplest 
aspects of probability were developed in Chapter 2 as well as the 
technique of computing binomial probabilities and reading proba- 
bilities from a normal probability table. The concepts and 
methods of Chapter 2 will now be applied to a greater variety of 
problems of statistical inference related to proportions or per cents. 
(If 16 out of 25 individuals possess a given trait the proportion 
having the trait is 16/25 = .64 and the per cent having it is 


100(16/25) = 64.) 


In many statistical studies the crucial issue relates to the size 
of a per cent or the comparison of two or more per cents. A new 
drug is discovered which appears to have curative properties for 
a hitherto baffling disease. What per cent of cases treated by the 
new drug recover? In what per cent of cases does the drug pro- 
duce harmful results? Is the per cent of cases helped by this 
drug greater than the per cent helped by a remedy already in use? 
In a manufacturing process the company is concerned with the 
per cent of defective articles produced and must continually meas- 
ure samples in order to keep that per cent within bounds. If the 
per cent becomes greater than some predetermined value it may 
be presumed that something has gone wrong with the machinery 
or that some other aspect of the manufacturing process should be 
investigated. A distributor of garden seed must make continual 
tests of the seed he sells. If he tested all of it he would have 
nothing left to sell, so he examines samples. Among the things he 
wants to know is the per cent of seeds of a given type which can 
be expected to germinate, and whether a larger per cent will ger- 
minate under one or another type of cultivation. The variety of 
practical situations in which samples are used to answer questions 


44 - Inferences Concerning Proportions 


about population per cents is endless. Any reader can quickly 
think of many such in his own field of study. 

The content which may be explored in problems of the kind 
discussed in this chapter is practically inexhaustible. The types 
of such problems are relatively few. Therefore, it seems eco- 
nomical of the student’s effort to organize the presentation in 
terms of types of problems with one illustration for each type, 
and to expect that the reader can apply the method to similar 
situations in the content of his own field. 

Test of Hypothesis P = .5 by Use of the Binomial Distribution. 
In order to make the discussion concrete let us return to the prob- 
lem of selecting the more intelligent of two persons by comparing 
their photographs. This problem was introduced at the end of 
Chapter 1 by considering an experiment based on five pairs of 
photographs. The analysis will be more interesting if the experi- 
ment is now extended to include more pairs of photographs, say 10. 

It was pointed out in Chapter 2 that the population studied 
by means of this experiment is a dichotomous or two-class popula- 
tion. The parameter P of the population is the probability of 
correctly selecting the more intelligent member of any random pair. 

A variety of hypotheses might be formulated about this un- 
known P. Let us consider first the hypothesis that the experi- 
menter has no ability to distinguish levels of intelligence by 
comparing photographs, so that a correct comparison is as likely 
as an incorrect comparison. In terms of the parameter the hy- 
pothesis is P = .5. 

The consequences of this hypothesis are stated in terms of 
the sampling distribution of the statistic p which is, for an ob- 
served sample, the proportion of comparisons in which the correct 
selection is made. This distribution of p appears in Table 3.1. 
It is a special case of the binomial distribution shown in Table 
2.1 when N = 10 and P=.5. The reader will note that the sam- 
pling distribution of p was constructed without knowledge of the 
proportion observed in any actual sample. The distribution sup- 
plies a statement of the probability, if the hypothesis is true, of 
obtaining in a sample of the given size one of the eleven possible 
values of p. 

The hypothesis P = .5 can now be tested by actually perform- 
ing the experiment with ten pairs of photographs and comparing 
the observed proportion of correct selections with the sampling 
distribution of p in Table 3.1. 


Test of Hypothesis P=.5 - 45 


Under the hypothesis that correct selections will be made in 
approximately half the comparisons (P = .5) it is reasonable to 
anticipate that in a sample of 10 cases an observed proportion 
р will be not far from .5. If it is very far from .5, say 0 or 1.0, 
most people intuitively feel that the observation contradicts the 
hypothesis, throws doubts upon it, renders it implausible. But 
intuitively, different people might hold different opinions as to 
what “very far from .5” means. The function of the sampling 
distribution is to provide an aid to intuition and to regularize 
intuitive decisions. ‘Very far from Р” will take on different 
meanings as sample size changes and as P changes and so cannot 
provide a dependable rule for dealing with hypotheses. However, 
it can be agreed that an observed sample and a hypothesis are to 
be considered in conflict whenever, under the hypothesis, the 
probability of obtaining a sample as extreme as the one observed 
is not greater than some specified small number, say 1/20 = .05 
or 1/100 = .01. In subsequent chapters various sampling dis- 
tributions will be considered. Regardless of the nature of the 
sampling distribution, or the statistic to which it relates, one may 
always agree to consider that the extreme observations having 
this specified small probability warrant rejecting the hypothesis 
tested. 

TABLE 3.1 The Sampling Distribution of an Observed Proportion p in 

Samples of 10 Cases from a Population in which P = 5 


Probability for Probability for 

p any value of P P-.5 
1.0 p» 1/1024 — .001 
9 10P*Q 10/1024 — .010 
8 45P8Q? 45/1024 = .044 
7 120P'Q* 120/1024 — .117 
6 210P'Q1 210/1024 — .205 
5 252P5Q5 252/1024 = .246 
А 210P'Q* 210/1024 — .205 
3 120P'Q 120/1024 — .117 
2 45P*Q* 45/1024 — .044 
a 10PQ* 10/1024 — .010 
0 Qu 1/1024 = .001 
1.000 


According to Table 3.1 we see that under the hypothesis 
P = .5 the probability that р = 0 or 1.0 is 


P(p = 0 or 1.0 | № = 10 and P = .5) = toza + roza = -002 


46 - Inferences Concerning Proportions 


Also the probability that p will be as extreme as .l or .9 is the 
probability that p will be either 0, .1, .9 or 1.0 and this is 


P(p = 0, .1, .9, or 1.0 | N = 10 and P = .5) 
= тб + 152a + 102a + Toza = 1034 = .022 


This probability is so small that practically all research workers 
would agree that if one of these values of p is observed in a sample 
the hypothesis Р = .5 should be rejected. Values of p which, 
when observed in a sample, lead to rejection of a hypothesis will 
be called rejection values. If the hypothesis P = .5 is rejected 
whenever one of these values of p appears in a sample, then in 
the long run a true hypothesis will be rejected in 22 out of 1000 
samples. Most people consider that 7325 = .022 is а small risk 
to take. These four values of p are said to constitute a critical 
region, or a region of rejection, or region of significance. The re- 
maining possible values of p, that is, р = .2, .3, A, .5, .6, .7, and 
.8 are then said to constitute a region of acceptance. The proba- 
bility of .022 is called the level of significance or the size of the region 
of rejection. The probability 1.00 — .022 = .978 is then the size 
of the region of acceptance. 

Some readers may be inclined to include the points .2 and .8 
in the region of rejection. The probability of a sample as extreme 
as these under the hypothesis P = .5 is 


P(p = 0, .1, .2, .8, .9, or 1.0) = зов = 11 


Most research workers are reluctant to adopt a rule which would 
cause them to reject a true hypothesis in 11 out of 100 samples 
and so they would prefer to classify .2 and .8 as acceptance values. 

If a sampling distribution is continuous, as for example the 
normal probability curve, the level of significance can be chosen 
arbitrarily, and for this purpose some small conveniently rounded 
number such as gy ог тіс is commonly chosen. If a sampling 
distribution is diserete such an arbitrarily chosen level of signif- 
icance may not conform to any probability which can possibly 
be obtained by cumulating probabilities at the extremes of the 
distribution. Thus we have just seen that in the present prob- 
lem, the level of significance can be set at .022 or at .11 and not 
at any intermediate value. If the research worker wants to work 
at the .05 level, about the best he can do in such cases is to say he 
will make the size of the region of rejection not larger than .05 
and the size of the region of acceptance not smaller than 95. 


Tests of Several Hypotheses on P - 47 


In Figure 3-1, the region of rejection consists of the points 
marked with heavy dots. The region of acceptance consists of 
the points marked with circles. The size of each region is repre- 
sented by the sum of the ordinates drawn at points in that region. 
Suppose that among comparisons of 10 pairs of photographs, cor- 
rect selections have been made in 9 pairs. The observed value 
p = .9 falls in the critical region, the region of rejection. There- 
fore the hypothesis P = .5 must be relinquished. 


Probability 
28 1---- ----------------- 


24-2 — A 
a20|--------r-HF4--------- 
16 }--— -—----4-4-}+--------- 
а арамш 
og-------4j--F-F-F-F------- 
qub с TL 
.00 


01 2.3 4556,3 800 
А В ср 
Sample Value of р 


Ета. 3-1. Rejection values e e and acceptance 
values o о for hypothesis P = .5 and associated 
probabilities for samples of 10 cases. 


The critical region can be located directly in the cumulative 
probability graph of Figure 3-2. Suppose you have decided to 
use а critical region of size о and to locate half of it in each tail. 
For illustration, let us assume you have decided on о = .05. On 
the vertical axis mark the point corresponding to 2(.05) = .025, 
and draw a horizontal line through it. This horizontal line meets 
the step graph of Figure 3-2 at p = .2. All values of p to the left 
of this intersection, not including p = .2, form the left critical 
region with probability not greater than .025. On the vertical 
axis mark the point corresponding to 1 — 3(.05) = .975 and draw 
a horizontal line through it. This horizontal line meets the step 
graph in the point p=.8. All values of p to the right of this 
intersection, not including р = .8, form the right critical region 
with probability not greater than .025. As in Figure 3-1, heavy 
dots have been used to represent rejection values. 

Tests of Several Hypotheses on P. Inasmuch as the hy- 
pothesis P = .5 has proved unacceptable, alternative hypotheses 


48 - Inferences Concerning Proportions 


Cumulative Probability 


ЕЛЕ 
Scale of p 
Fic. 3-2. Rejection values e ® and acceptance values 


оо for hypothesis P = .5 and associated cumulative 
probabilities for samples of 10 cases, 


may be tried. Any value of P, other than .5, between 0 and 1 
may be an alternative. As the sample is finite, p can take values 
at the points 0/N, 1/N, 2/N,---(N — 1)/N, N/N and at no other 
points. Therefore the scale for p consists of discrete points. 
While P is a constant for any one population, we are here con- 
sidering all possible dichotomous populations, and so the scale 
for P is a continuum extending from 0 to 1. For certain selected 
hypothetical values of P the probability distribution of p has 
been computed and is shown in Table 3.2. Each horizontal row 
of the table is a sampling distribution for which the sum would 
be 1.00 except for rounding errors. The vertical columns are not 
probability distributions. 

We shall now seek rejection values for each of these distri- 
butions and shall represent them graphically in Figure 3-3 in 
such a way as to be able to study them all at one time in one 
unified picture. 

Suppose it has been agreed that the region of rejection shall 
have probability .025 or less at each end of the distribution. Now 
let us see how this would work out for a particular row of Table 
3.2, say the row for which P = .70. The sum of the probabilities 
for p = 0, p =.1‚ p = .2 and p = .З is approximately .010 which 
is considerably less than the specified .025. However if the proba- 


Tests of Several Hypotheses on P - 49 


TABLE 3.2 Sampling Distribution of an Observed Proportion p in Samples 
of 10 Cases for Selected Values of P 
(Decimal points omitted to save space) 


__————————————————————————— 


Probability of a given value of р 


P 0 a 2 3 4 5 6 ak 8 9 10 биш 
95 001 010 075 315 599 1000 
90 002 O11 057 194 387 349 1000 
85 001 008 040 130 276 347 197 999 
80 001 006 026 088 201 302 268 107 999 
75 003 016 058 146 250 282 188 056 999 
70 001 009 037 103 200 267 233 121 028 999 
.65 001 004 021 069 154 238 252 176 072 013 1000 
:60 002 011 042 111 201 251 215 121 040 006 999 
55 004 023 075 160 234 238 166 076 021 003 1000 
50 001 010 044 117 205 246 205 117 044 010 001 1000 
45 003 021 076 166 238 234 160 075 023 004 1000 
40 006 040 121 215 251 201 111 042 011 002 999 
35 013 072 176 252 238 154 069 021 004 001 1000 
30 028 121 233 267 200 103 037 009 001 999 
25 056 188 282 250 146 058 016 003 999 
20 107 268 302 201 088 026 006 001 999 
15 197 347 276 130 040 008 001 999 
10 349 387 194 057 011 002 1000 
05 599 315 075 010 001 1000 


а 


bility for р = .4 is included the sum becomes .047 which is con- 
siderably larger than the specified .025. On Figure 3-3 heavy 
dots mark rejection values. These are so chosen that the proba- 
bility for the region of rejection in each tail is .025 or less. 

In Figure 3-3 regions of rejection have been drawn in similar 
fashion for each of the selected values of P. The probabilities 
associated with these regions are not uniform, as can be seen by 
computing a few of them from the figures in Table 3.2. As a 
kind of compromise among these various regions, the pair of 
curved lines has been drawn on Figure 3-3. Whenever there is 
a region of probability approximately .025 the lines pass near the 
end point of that region. 

These curved lines can be used to obtain a region of rejection 
for some value of P not listed in Table 3.2, say Р = .47. The 
point .47 is located on the vertical axis and a horizontal line 
drawn across the chart, cutting the curves at points for which pı 
is approximately .13 and p» approximately .85. АП values 


50 - Inferences Concerning Proportions 
1.00 


Population Value of P 
a 
о 


о al ^2 3 4 5 6 (i 8 9 1.0 


ample Value of p 


Fic. 3-3. Regions of rejection ® ® and acceptance for testing hypotheses 
concerning P on evidence from a sample of 10 cases at .05 level of significance. 


for which .13 < p < .85 are acceptance values and all points for 
which p > .85 or p< .13 are rejection values for the hypothesis 
Р = A7 at significance level .05. Actually for the discrete dis- 
tribution this means that the values 0, .1, .9, and 1.0 are rejection 
values with probability somewhat smaller than .05. In this man- 
ner the range of rejection values can be obtained for any value of P. 

A sample with proportion p drawn from a population with 
proportion P would be represented on Figure 3-3 by a point with 
coordinates (p, P). Such a point is called a sample point. Locate 
the following sample points and note that all of them are between 
the pair of curved lines and near the upper line: 


(p = 4, P=.7), (p = 6, P = .8, (p = .8, P = 9). 


Locate the following sample points, all of which are below the 
lower curved line: 


(р = 6, P = 2), (p= 8, P=.3), (p = 4, Р = .05). 


Estimation by a Single Value - 51 


Locate the following sample points all of which are above the 
upper curved line: 
(p = .3, P = .9), (p = .5, P = .85), (p = .6, P = .95). 


The area between the curved lines in Figure 3-3 is a region 
of acceptance and the area outside them a region of rejection, 
or critical region. Whenever the two coordinates of a sample 
point (p, P) are such that the point falls in the region of rejection, 
the observed value p is considered to furnish evidence justifying 
the rejection of the hypothesis concerning P. Thus the figure 
furnishes us a short cut to the testing of hypotheses about pro- 
portions. In the problem concerning the judgment of intelligence 
from photographs, suppose the experimenter is correct in 9 out 
of 10 pairs, so p=.9. The sample point for which р = .9 and 
P = „5 lies in the region of rejection. So does every sample point 
for which р = .9 and Р < .5. Sample points for which p = .9 and 
.55 < P < .98 (approximately) are in the region of acceptance. 


EXERCISE 3.1 
1. Test each hypothesis (a) to (f) by locating a point on Figure 3-3. 
б (0) Р= 45 ()P-2 (е) Р=.57 
HN-10adp-4 (ә) р=-9 (8 Р=.65 (f) Р=.13 
24 Е (а) P = .53 (c) P=16 (евр es 
HN-10adp-5 уро) (dP-73 (0 Р= 


@Р=15 (c) P=0t ()P-.5 
HN-10amdp-2, (ур 61 (@)P=01 ()P-.4 


2. From the answers to the three parts of question 1, formulate a 
general statement as to the range of hypotheses acceptable at the .05 level 
when N = 10 and (a) p= 4; (b) p= 5; (c) p = .2. 


Estimation of P. To this point we have been concerned with 
tests of hypotheses about the value of the population proportion, 
Р. A problem which is perhaps of greater interest is to find from 
the information in the sample an estimate of the value of P. We 
shall discuss two methods of estimating P, (1) by a single value 
and (2) by an interval. 

Estimation by a Single Value. On page 28, it was found that 
E(p)- P. This means that if an unlimited number of samples 
were drawn and р calculated for each of them, even if the number 
of cases in each sample were very small, the average of all the 
computed values of р would be P. Thus, even though one must 


52 - Inferences Concerning Proportions 


expect to be wrong in nearly every instance, he may expect to be 
right on the average, in the long run. 7 

If а statistic has a population parameter as its expected value, 
the statistic is called an wnbiased estimate of that parameter. In 
later chapters we shall meet statistics which do not possess this 
highly desirable quality. 

The sample proportion is also called a consistent estimate of 
the population proportion because for large samples the value 
of p is likely to be close to P. In Chapter 2, the standard de- 


viation of p was given by Formula (2.14) as о, = V x) To see 


what this means in relation to the value of N, consider samples 
of 25, 100, and 1000 cases from a population in which P is actu- 
ally .5. 

ИМ = 25, op =.1 

If N = 100, c, = .05 

If N = 1000, c, = .0158 


consequently 
P(30« p< .70| N = 25) = .95 
P(.40 < p< .60 | М = 100) = .95 
Р(.47 < p< .53 | М = 1000) = .95 


To put this more concretely, suppose a poll is conducted to 
determine what proportion of residents of a community are in 
favor of a particular issue. If half of them are actually in favor 
of the issue (P = .5), there is probability .05 that in a sample of 
25 cases the observed proportion р will be in error by .20 or more, 
that in a sample of 100 cases it will be in error by .10 or more, 
and that in a sample of 1000 cases it will be in error by .03 or more. 

Interval Estimate. A statistic with the desirable qualities of 
unbiasedness and consistency provides an estimate of the param- 
eter which may be assumed to be numerically close to that 
parameter. An estimate of a parameter which is often more 
satisfactory is one that uses two statistics, unequal in value, which 
jointly provide an interval estimate of the parameter. 

The parameter is stated as lying between these statistics in 
much the same way that one might say “I feel confident John's 
age is between 25 and 30.” There are two important differences 
between this somewhat casual estimate of John’s age as lying in 
the interval 25-30 and the statistical intervals about to be dis- 


Interval Estimate for P - 53 


cussed: (1) In the statistical estimate the degree of confidence 
will be decided upon before the interval is obtained and will be 
described in numerical terms, whereas in the popular expression 
the word “confident” means different things to different people. 
(2) In the statistical estimate, the numbers which define the in- 
terval are computed from the data by a rule which guarantees 
the degree of confidence to be placed in the interval estimate, 
whereas in the popular expression the numbers are more or less 
impressionistic. 

Tf one makes the interval wider, he can have greater confidence 
that his statement is correct. Thus the estimate that “John’s 
age is between 5 and 95” might be made with practically complete 
confidence but the interval would be so wide as to be of little 
practical value; the estimate that “John’s age is between 26 years, 
6 months, and 26 years, 7 months,” could be made far with less 
confidence if John’s age is actually unknown. 

An interval estimate is a statement about a parameter such 
as “Р is between .3 and .5.” The statement is either true or false 
and we do not know which, but the methods by which the state- 
ment is obtained furnish a measure of the confidence with which 
the assertion may be made. That measure of confidence, called 
a confidence coefficient, is usually a value between .90 and 1.00, 
After this confidence coefficient has been chosen, two statistics 
are computed from the sampie, both from the same sample. For 
convenience we shall call these two numbers A and B, with A 
less than B. Then the interval estimate of P is the statement 


A<P<B. 


The numbers A and В are called the confidence limits and the 
interval from A to В is called a confidence interval. 

Interval Estimate for P when М = 10. The abstract discus- 
sion of the preceding paragraph will now be illustrated by a com- 
putation making use of Figure 3-3, which applies to samples of 
10 cases, The first step in obtaining the numbers A and B is 
to select a confidence coefficient. If we are to make use of Figure 
3-3 which has been constructed with о = .05, we must in this 
illustration use a confidence coefficient of 1 – .05 = .95. The 
relation between the confidence coefficient and æ will be explained 
below. We shall now assume that in a sample of 10 cases, 3 in- 
dividuals have been found to possess & particular characteristic, 
во p=.3, The proportion P of the population is unknown, 


54 - Inferences Concerning Proportions 


On the baseline of Figure 3-3, locate the observed sample 
value, р = .3 and place a ruler or the edge of a card so as to in- 
dicate the vertical line through this point. Note where this line 
cuts the two curves. On the vertical scale read the values of 
these intersections, calling the smaller value A and the larger B. 
For p = .3 we find A = .05 and В = .65. The interval estimate 


05 < P< .65 


can therefore be made with confidence coefficient .95. To dis- 
tinguish a confidence statement from a probability statement, this 
text will use the notation 


йл Соп (.05 < Р < .65) = .95 
C(.05 < P < .65) = .95 


This may be expressed in words thus: “We have confidence .95 
that the unknown proportion lies between .05 and .65” or “We 
assert that P is not smaller than .05 and not larger than .65 and 
we have arrived at these numbers by a procedure which if applied 
repeatedly would yield interval estimates of which 5% would not 
contain the true value and of which the remaining 95% would 
contain it.” 

To understand why an interval estimate computed by the 
method just described is correct in 95% of samples, note that the 
curves in Figure 3-3 were drawn so that for any given value of 
P the probability is .95 that the sample point (P, p) is between 
the curves. Since this holds for any value of P it holds for the 
unknown P of the population being sampled. This means that 
for 95% of samples from the same population the values of p will 
determine a Sample point which is on the horizontal line through 
P and which lies between the two curves. For all these values 
of p the confidence limits will contain P between them. Hence the 
statement that P is contained between the calculated limits can 
be made with .95 confidence. 

Figure 3-3 was constructed with level of significance a = .05. 
Tt is apparent now that the confidence coefficient is 1 — o, and 
the confidence interval for P may be stated in general as 
(3.1) QA < P< В) =1-а 

Location of Confidence Limits for Samples of Varying Size 
by the Use of Charts. The intervals which can be read from 


Figure 3-3 are only approximately correct because of the dis- 
continuity of the actual sampling distribution for N = 10. When 


Probability and Confidence - 55 


N becomes larger, the binomial probability distribution appears 
smoother because the steps are smaller and more numerous so 
that there is less and less discrepancy between it and a fitted 
smooth distribution. For small samples such as N = 10 it is not 
possible to mark off exactly 2.5% of the area at each end of each 
distribution and so the volume outside the curved lines cannot 
be assumed to be precisely 5% of the total volume. These diffi- 
culties tend to disappear when samples of larger size are used. 

If the rejection points of Figure 3-3 are not marked but only 
the curves drawn, it is possible to draw on one chart the con- 
fidence belts for samples of several different sizes. Such charts 
for confidence coefficients of .99 and .95 have been published by 
C. J. Clopper and E. S. Pearson.!* Additional charts for confidence 
coefficients of .90 and .80 are published on pages 332 and 333 of 
Techniques of Statistical Analysis? The Clopper-Pearson chart 
for confidence coefficient .95 is reproduced here as Chart VI in 
the Appendix. The small numbers written on the curved lines 
indicate the size of sample for which the confidence belt applies. 
М = 10, 15, 20, 30, 50, 100, 250, and 1000. Study of this chart 
reveals that for a given confidence coefficient the belt becomes 
narrower as sample size increases. As the size of the sample 
approaches that of the population, the belt narrows to a mere line. 
For any given sample size, the confidence belt would be wider 
if the confidence coefficient were larger. For any finite value of 
N, the confidence belt would narrow to a line as the confidence 
coefficient approaches zero. 

Confidence Intervals for Other Parameters. The preceding 
discussion of confidence intervals applies also to parameters other 
than Р. A confidence statement about a population mean made 
with confidence coefficient .90, for instance, might read 

C(46.3 < p < 52.7) = .90 
and one about a population variance made with confidence .99 
might read 

С(18.3 < e? < 21.5) = .99 
Rules for obtaining the interval estimates for the mean, the 
variance, and the correlation coefficient will be developed in later 
chapters. 

Probability and Confidence. Some discussion of the two terms 
probability and confidence is needed to clarify the difference be- 

* References referred to by number will be found at the end of the Chapter. 


56. Inferences Concerning Proportions 


tween them. An analogy between sampling and games of cards 
will help to clarify the distinction. 

We may conceive of the deck of cards as a population, and of 
a hand dealt after shuffling the deck as a random sample. There 
are two ways of reasoning about the deck of cards and the dealt 
hand: 

(1) From a known or hypothesized characteristic of the deck 
we can calculate the probability that the hand will have some 
characteristic. We can, for example, calculate the probability 
that in a game of bridge a hand will contain 10 or more spades. 

This form of reasoning is analogous to computing the proba- 
bility of a characteristic of a random sample from a known, or 
hypothesized, characteristic of the population. 

(2) From the known composition of a hand we wish to draw 
some conclusion about the unknown composition of the deck. A 
player who sees his opponent draw a hand consisting of 13 spades 
may lose confidence in the notion that the deck is the standard 
bridge deck. He may go further and say he has confidence that 
the deck has any one of certain types of composition and that it 
does not have certain other types. 

The concept of probability is used in reasoning from a known 
population to arandom sample. The concept of confidence is used 
in reasoning from an observed sample to its unknown population. 


EXERCISE 3.2 

1. From Chart VI read the confidence interval for the population 
proportion if in a sample of 30 cases 9 cases have shown a particular 
characteristic, p = 35 = .3. Place a card or edge of a ruler on the chart 
to mark the vertical line through the point .3 on the horizontal axis. 
Note the points where this line cuts the two curves marked 80. The scale 
values of the ordinates of these points appear to be approximately .15 
and .50. Thus the interval sought is .15 < P < .50. 


2. From Chart VI verify the following interval estimates: 


a. ИМ = 20 and p= 4, 19 < P < .64 
b. ИМ = 30 and p = 4, .22 < P < .60 
c. If N = 100 and p = .4, 30 < P < .50 
d. If N = 1000 and p = .4, 37 < P < 48 


3. What interval would you read from Chart VI if = 50 and p = -44? 
The scale point .44 is not marked on the horizontal axis but can be esti- 
mated by eye with fair accuracy. Then .29 < P < .58. 

4, What interval would you read if N = 40 and p = .75? For М = 30, 
the interval is .55 < P < .89. For N = 50 it is .60 < P < .87. А rough 


Aids for Computing Binomial Probabilities · 57 


estimate for N = 40 would place the interval limits between these two 
but a little closer to the limits for N = 50, A fairly good guess would 
be .58 < P < .88. 

5. Read estimates for P from Chart VI when 


а. N = 20 and p = .65 d. М = 30 and p = .25 
b. N = 100 and p = .65 e. N = 50 and p = .25 
c. N = 250 and p = .65 f. N = 1000 and p = .25 


6. Comparing the answers obtained in question 5, what appears to be 
the effect on a confidence interval of changing the size of the sample while 
keeping the confidence coefficient constant? 

7. In a sample of 100 ball bearings chosen at random from those coming 
from a factory during one morning, 10 have too large a diameter and are 
considered defective. What interval estimate can be read from Chart VI 
for the proportion with too large diameter in the entire morning’s output 
of the factory? 

8. Ina random sample of 96 parents of grade school children in City A 
it is found that 75 are in favor of using tax money to establish a nursery 
school in connection with the public schools. Formulate an interval 
estimate with œ = .05 for the proportion of all parents holding the same 
opinion. 


Aids Available for Computing Binomial Probabilities. Com- 
putation of the terms of the binomial distribution is so laborious 
for large values of N that anyone who needs to make tests about 
proportions will wish to make use of published aids and also of 
methods of approximation. In the Appendix of this book are 
tables which can be used to reduce arithmetic labor. 

Table IVB gives the cumulative probability, or the sum, of 
the first m -- 1 terms in either tail of the symmetrical binomial 
distribution for which P = .5, for values of № running from 5 to 
25. The entries in the right-hand column of Table 3.1 on page 45 
were taken from the entries in the row N = 10 of Table IVB. 


Thus for m = 1, :011 = .001 + .010 
m = 2, .055 = .001 + .010 + .044 
т = 3, .172 = .001 + .010 + .044 + .117 


For Р ~ .75, the cumulative probability for the first m +1 
terms in the left tail is given in Table IVA, and for the last m + 1 
terms in the right tail in Table IVC. 

For Р = .25, the cumulative probability for the first m +1 
terms in the left tail is given in Table IVC, and for the last m + 1 
terms in the right tail in Table IVA. 


58 . Inferences Concerning Proportions 


Suppose now that we wish to test the hypothesis P = .75 by 
means of a sample of 24 cases, using approximately the .05 level 
of significance. Suppose further that we wish to discard the hy- 
pothesis if p is either too much larger or too much smaller than 
75. What is the region of significance? If we had the entire 
distribution of (.25 + .75)*, we could find the left critical region 
by adding terms at the first end of the distribution, 


(.25)* + (Е ) (.25)2(.75) + i (.25)2.75)2 + ++ = 


until the sum of those terms was approximately .025. The same 
result can be achieved painlessly by looking in row N = 24 of 
Table IVA for the entry nearest .025. This is .021 in column 
m = 13, and it indicates that 


Р(Х =0,1,2--- 120r 13 | N = 24 and P = .75) = .021 


Then the left critical region consists of the values X = 0,1... 13 
or of p = dg, 2p °:° 44. To find the right critical region, we could 
add terms at the right end of the distribution 


(75) + el (25) (75+ ip (25) 75) +... 


until their sum was approximately .025. This result can be 
achieved painlessly by looking in row N = 24 of Table IVC for 
the entry nearest .025. This is .040 in the column m=2. It 
may be read to mean either that 


P(X = 0, 1 or 2 |N = 24 and P = .25) = .040 
or that Р(Х = 24, 23, or 22 | М = 24 and P = .75) = .040 


The latter interpretation provides us with the right critical region 
needed. Itis X = 24, 23 and 22 or p = 34, $$, and 34. These two 
regions combined have probability .021 + .040 = 061. Because of 
the discrete nature of the distribution this is as near to the .05 
level as we can come. 

Under the same circumstances if we wish to use a one-tailed 
test, rejecting the hypothesis only for values of X much smaller 
than those expected under the hypothesis, we should look in 
Table IVA for the entry nearest .05. This is .055 in the column 
т = 14. It indicates that the critical region consists of X = 0, 
1,255 ma 4d eve wish to use a one-tailed test rejecting the 
hypothesis only for values of X much larger than those expected 
under the hypothesis, we should look in Table IVC for the entry 


Exercise - 59 


nearest .05. It is .040 and indicates that the critical region with 
probability .04 consists of X — 24, 23 and 22. 
If some value of P other than .25, .50 or .75 is needed, Table V 


may be used. It provides the binomial coefficients " from 


which the terms of the binomial distribution can be obtained by 
multiplication. The (m + 1)st term of the distribution for given 
N and P is obtained by multiplying the number in row N and 
column m of this table by QN-^P". Up to М = 10 all the coeffi- 
cients are shown. For N » 10 the page is too narrow to ac- 
commodate the entire set, but as the set is always symmetrical 
the remaining coefficients can easily be obtained from those 
presented. The coefficient for term N — m is the same as for 
term m. Table V is useful only for N not larger than 20. 

Very extensive tables of the binomial probability distribution 
have been published by the National Bureau of Standards.* 
These give values of the individual terms of (Q + P)" for values 
of P from .01 to .50 by intervals of .01 and for values of N from 
2 to 49 by intervals of 1. They also give values of the sum of 
the terms for the same values of P and N. 


EXERCISE 3.3 

1. Suppose that from the rolls of a large university the cards of 22 
students are selected at random and these students are asked to agree or 
disagree with a statement of opinion on some public issue of the day, but 
are not permitted to say they are undecided or uninformed. Let p be the 
proportion answering “Yes” and q the proportion answering “No.” И 
the hypothesis that P = .5 proves tenable, it will be concluded that the 
students in this university have no consistent opinion on this issue. What 
number of “Yes” answers would constitute a critical region for the rejec- 
tion of this hypothesis? Answer for the .02 level of significance, and for the 
:05 level. 

Solution, The population which is the entire student body of the 
university, is not infinite but is large. The hypothesis will be refuted if 
either too many or too few students answer “Yes.” At the .02 level of 
significance, the critical region will consist of the most extreme terms at 
the lower end of the binomial with P = .5 and N = 22 such that the sum 
of the probabilities of those terms is .01 and the most extreme terms at the 
upper end of the distribution such that the sum of the probabilities of those 
terms is .01. On line М = 22 of Table IVB we see that .008 is the sum of 
6 terms at one end of the distribution. Therefore if X is the number of 
“Yes” answers, at the .016 level of significance (which is the nearest pos- 
sible value to .02 when the critical region is in both tails), the critical 


60 · Inferences Concerning Proportions 


region consists of values of X = 5 and X 217. At the .05 level of sig- 
nificance the critical region is X < 6 and X > 15. 
2. Answer the question of problem 1 under the following circumstances: 
a. N = 14 and level of significance is .06 А 
b. N = 18 and level of significance is .03 
с. N = 25 and level of significance is .05 


One-sided and Two-sided Tests of Hypotheses. In Chapter 1, 
in testing the hypothesis that a subject’s intelligence could not 
be inferred from his photograph, it was tacitly assumed that rank- 
ing photographs in completely wrong order (DCBA when ABCD 
is correct) or misjudging every pair of photographs (thus making 
the record - —— — — ) would not throw suspicion on the hy- 
pothesis. Therefore the critical region was located entirely in one 
tail of the probability distribution. Such tests are called one- 
tailed tests, or one-sided tests, or tests of a one-sided hypothesis. In 
Figure 3-1 on page 47 and the test related to it, the critical region 
was located in both tails of the distribution, half of the region 
being in each tail. Such tests are called two-tailed tests or two- 
sided tests, or tests of a two-sided hypothesis. Up to this point the 
choice between a one-sided and a two-sided test has been made 
more or less intuitively from the general logic of the question to 
be answered by the test. We shall now develop certain concepts 
which will assist in clarifying the reasons for the location of the 
critical region. 

The letter H is customarily used as an abbreviation for hy- 
pothesis. Thus Н: Р =.5 means “the hypothesis that in the 
population the proportion of cases having the given character- 
istic is .5” while H : P =.5 means “the hypothesis that in the 
population the proportion of cases having a given characteristic 
is .5 or greater." 

Two Kinds of Error. Rejecting a hypothesis when it is true 
is called an error of the first kind. The level of significance is the 
probability of making an error of this kind. As the level of signif- 
icance is commonly called o (the small Greek letter alpha cor- 
responding to the English a), an error of the first kind is also 
called an alpha error. The probability of an error of this kind 
can be made arbitrarily small by making alpha small, but unfortu- 
nately a reduction in the probability of rejecting à hypothesis 
when it is true is accompanied by an increase in the probability 
of accepting it when some alternative to the hypothesis is true. 
This latter error is called an error of the second kind or a beta error. 


Two Kinds of Error · 61 


The Greek letter 8 (beta) is used to represent the probability of ac- 
cepting a hypothesis when some alternative is true. Most statis- 
ticians do not look with favor upon choosing an extremely small 
level of significance because that would expose them toa large risk 
of error of the second kind. This relation will be clarified in the 
following paragraphs. 

Suppose that P is actually .6 but we do not know that fact 
and we test the hypothesis P = .5 using a sample of 10 cases. 
Suppose also that, knowing the probability distribution of p under 
the hypothesis tested, we decide upon p = 0, .1, .9 or 1.0 as the 
region of rejection. Thus 


о = .001 + .010 + .010 + .001 = .022 or .02. 


What is the probability that the hypothesis P = .5 will be rejected 
when P is really .6? Accepted when Р = .6? 

Table 3.2 on page 49 provides a very easy means of answering 
this question. As we are assuming that P is actually .6, the true 
probability distribution of p is given in the row for which P = .6. 
Then 


P(p = 0, .1, 9 or 1.0 | P = .6) = 0 + .002 + .040 + .006 = .048 
and P(.2 <p <.8|P=.6) = 1—.048 = .952 


The risk of a false acceptance of H : P = .5 when Р = .6 is therefore 
В = .954. With a larger discrepancy between the true value of 
P and the value under the hypothesis, the risk of acceptance of a 
false hypothesis would be less. For example 


P(p = 0, 1, .9 or 1.0 | P =.1) = .349 + .387 + 0 +0 = .74 
and P(.2 <p s.8|P-.1) =1- .74 = .26 


Hence the risk of false acceptance is only .26 when P = .1, The 
probabilities here treated as zero are not absolutely zero but are 
merely numbers so small that only zeros occur in the first three 
decimal places. Rounding errors cause some of the totals at the 
right of Table 3.2 to be slightly different from 1.000. 

Similar computations have been made for each of the values 
of P shown in Table 3.2 and these have been recorded in columns 2 
and 3 of Table 3.3. For all these the level of significance was 
taken as а = .02 and the region of significance as p = 0, .1, .9 or ie 

If the region of significance is increased to include p = .2 and 
D = .8, the significance level becomes = лі. The probability 
of rejecting H :P = .5 for this situation has been computed for 


62 · Inferences Concerning Proportions 


each of the same values of P and recorded in column 4 of Table 
3.3; the probability of accepting H : P = .5 isrecorded in column 5. 


TABLE 3.3 The Probability of Accepting the Hypothesis P — .5 on Evidence 
from a Sample of 10 cases when P has a Specified Value, if œ = .02 and if 
а = Л 


а = .02 а = 11 

True Probability Probability Probability Probability 

Value of Р of rejecting of accepting of rejecting of accepting 

H:P=.5 Н:Р = .5 Н:Р = .5 Н:Р = .5 
.95 :91 .09 99 .01 
.90 74 .26 93 07 
85 .54 46 82 18 
80 38 62 68 32 
75 .24 76 53 AT 
70 15 85 38 62 
.65 .09 91 27 73 
.60 .05 .95 18 82 
55 0з 97 13 87 
.50 .02 98 aut 89 
45 .03 97 13 87 
40 05 95 A8 .82 
.35 .09 91 27 73 
.30 .15 85 38 .62 
25 .24 76 .53 AT 
20 38 .62 .68 32 
45 .54 46 82 18 
10 74 26 93 07 
05 91 09 99 01 


Rejection values are p = 0, .1, .9 and 1.0 when a = .02 
Rejection values are р = 0, .1, .2, .8, .9 and 1.0 when a = .11 


The test of a statistical hypothesis involves a statistic and 
a critical region. The probability that the statistic will fall in 
the critical region is the probability that the hypothesis will be 
rejected. This probability of rejecting a hypothesis is called the 
power of the test. The power is a variable which depends upon 
what alternative to the hypothesis is actually true. If we knew 
that a certain alternative were true we should not need to test 
the hypothesis. So we cannot establish the power of the test for 
the one correct alternative but must consider the power in general 
for various alternatives. If two tests are equally satisfactory a8 
to the level of significance and so involve the same risk of rejecting 
a hypothesis when it is true, but one test is more powerful than 


Two Kinds of Error - 63 


the other because it is more likely to cause rejection of a false 
hypothesis, then the more powerful test is to be preferred. An 
illustration of this situation will be given presently. 

Certain important relations can be brought out more clearly 
by means of a graph. In Figure 3-4 values of P which are alterna- 


Probability of rejecting 
H:P=.50 


3 4 H .6 .7`.8 9 L0 
Scale of P 


Fic. 3-4. Power function of the test of 
hypothesis P — .5 for N — 10 at two levels 
of significance, a = .02 and a = .11. 


tives to the hypothesis are laid off on the horizontal scale. The 
value under the hypothesis is marked Н. The vertical axis pass- 
ing through H is scaled to show probabilities from 0 to 1. At 
each of the indicated values of P an ordinate is erected with 
length proportional to the power of the test for that alternative. 
For the curve marked æ = .02 these ordinates represent the proba- 
bilities in column 2 of Table 3.3; for the curve marked а = .11 
the ordinates represent the probabilities in column 4. The curved 
line which connects the tops of one set of these ordinates is the 
graph of the power function of the test concerned. This graph 
crosses the vertical axis at a point whose ordinate is the level 
of significance. 


EXERCISE 3.4 

1. Verify the probabilities stated in Table 3.3 by computation from 
the data of Table 3.2. 

2. Make similar computations with a = .002 and critical region p = 0 
or 1. Plot the corresponding power function on the chart of Figure 3-4. 
Note that it lies entirely below the other graphs. Does this mean the test 
with о = .002 is more powerful or less powerful than the test with а = .02? 

3. Answer each of the following questions by naming a line segment 
in Figure 3-4. Place the letter J where the lower curve and К where the 
upper curve eut the verticalaxis. Place M at top of that axis. 

а. а = .02 b. а = .11 


64 - Inferences Concerning Proportions 


c. the probability of rejecting H at significance level æ = .02 if P is .2 

d. the probability of rejecting H at significance level а = .11 if P is.2 

е. the probability of accepting H when it is true if а = .02 

f. the probability of accepting H when it is true if а = .11 

g. the probability of accepting H when the actual value is P = .2 if 
а = 02 

h. the probability of accepting H when the actual value is P = .2 if 
а = .11 


i, В when Р = .20 and а = .02 

j. 8 when Р = .20 and а = .11 

4. From your answers to question 3 which significance level provides 
the more desirable test when the hypothesis tested is true? Which provides 
the more desirable test when some other hypothesis is true? 

5. In testing H : P = .5 with critical region р = 0, .1, .9, and 1, against 
which alternative is the test more powerful, P = .60 or Р = .80? 

6. Is the test more powerful in general when о = .02 or when а = .11? 

7. In Figure 3-4 what represents the probability of error when the 
hypothesis is true? When it is not true? 

8. Suppose a one-sided test with critical region p = .7, .8, .9, and 1 is 
made of H : P = .50. What is the value of a? Compute the power of this 
test for each alternative value of P shown in Table 3.2. Make a graph of 
the power function of this test. 

9. Suppose a one-sided test with critical region p = 0, .1, .2, and .3 is 
made of H : P = .50. What is the value of а? Compute the power of 
this test for each alternative value of P shown in Table 3.2. Make a graph 
of the power function of this test. 


Choice of Critical Region. Figure 3-5 displays the power func- 
tion for each of three different tests of the hypothesis P = .50. 
These tests differ as to the location of the critical region. For 
test A the region is located wholly in the upper tail; for test C 
it is wholly in the lower tail; for test B half of the region is in 
the upper tail and half in the lower. The power function for test 
B has already been drawn in Figure 3-4 and is reproduced here 
for the sake of comparison with those for the one-sided tests. 
The situations in which one of these three tests is more powerful 
than the others could be explained better if it were possible to 
make all three regions identical in size, that is to give all three 
tests the same level of significance. However, the discrete nature 
of the binomial distribution makes it impossible to choose a level 
of significance arbitrarily and so it is impossible to find a опе- 
sided region and a two-sided region having exactly the same 
probability. Later when we work with continuous - probability 


Choice of Critical Region - 65 


distributions it will be possible to draw a figure similar to Figure 
3-5 but with all three curves crossing the vertical axis at the same 
point. The meaning of these graphs will be clarified by the answers 
to the questions in Exercise 3.4 and you may wish to study that 
exercise before you read the following paragraph. 


Probability of rejecting 
Н:Р=.50 


ол .2.3.4 H 6 .7 .8 .9 10 


Fic. 3-5. Power function of the test of hypothesis 
Р = 5 for N = 10. 
А, With а = .17 and critical region р = .7, .8, .9, and 1.0. 
B. With œ = .11 and critical region р = 0, -1, .2, .8, .9, and 1.0, 
C. With a = -17 and critical region p = 0, .1, .2, and .3. 

Certain generalizations can now be made. (1) When a hy- 
pothesis is true, the probability of falsely rejecting it depends 
only on the level of significance and is not at all affected by the 
location of the critical region. The probability of an о error is 
fixed and is the same no matter whether a one-sided test is em- 
ployed or a two-sided test. (2) After the probability of type I 
error has been controlled by the arbitrary selection of the signif- 
icance level, the critical region can be so located as to secure 
maximum power against a particular alternative or set of alterna- 
tives, If the logic of the problem makes it important to reject 
Н when P > Ру and unimportant to reject when Р < Py, then 
the critical region should consist of values at the extreme of the 
right-hand tail. On the other hand if there is no logical need to 
reject H when the true value is larger than the hypothesized value 
but every reason to reject it when the true value is smaller, then 
the lower tail will be critical. If deviations from hypothesized 
value in one direction are logically just as damaging to the hy- 
pothesis as deviations in the other direction, then a two-sided 
test should be used with half of the critical region in each tail. 
(3) Other things being equal, an increase in а is accompanied 
by a decrease in В and vice versa. 


66 - Inferences Concerning Proportions 


The form in which the hypothesis is stated may be used to 
indicate the location of the critical region, thus: 

Н:Р = .8, implies a one-sided test with left tail critical 

Н:Р < .8, implies a one-sided test with right tail critical 

H: Р = .8, implies a two-sided test 


IA IV 


This practice is not universal, however, and a one-sided hypothesis 
may be written as in question 1 of Exercise 3.5. 


EXERCISE 3.5 
1. From the data of Table 3.2 find the probability of rejecting H : P=.5 
if the critical region is p = .7, .8, .9 and 1 and if P = .35, .40, .60, .70. Plot 
these probabilities on Figure 3-5. Each of the points should fall on the 
curve marked A. 
2. Do the same for the critical region р = 0, .1, .2, and .3. "These 
points should fall on the curve marked C. 
3. What is the probability of rejecting a true hypothesis if test A is 
used? Test B? Test C? 
4. When the hypothesis is true, does the critical region for В provide 
a more or less satisfactory test than the critical regions for A and for C? 
Make your decision in the light of your answer to question 3. 
5. When the true probability is P = .40, estimate from the graph 
а. the probability of rejecting Н: Р = .50 when critical region is 
p=.7, 8, 9, and 1 

b. the probability of rejecting H : P = .50 when critical region is 
р=0, .1, .9, and 1 

c. the probability of rejecting H : P = .50 when critical region is 
p= 0, .1, .2, and .3. 

d. the probability of an error of the second kind when Test A is used 

e. the probability of an error of the second kind when Test B is used 

f. the probability of an error of the second kind when Test C is used 

6. If P > Px, which of the three tests has the greatest power? 
Which has the least power? 

7. If P < Pu, which has the greatest power? Which the least? 

8. A research worker has tested the hypothesis P > .75 using a sample 
of 20 cases. He decides to make « as near .05 as possible. 

a. What is the critical region? Ans. From the way in which the 
hypothesis is stated, we see that the critical region should be located in the 
left tail of the binomial for which P = .75 and N = 20. Reference to the 
row М = 20 of Table IV shows о = .041 as the value nearest to a = .05. 
This value corresponds to m= 11, or р= ў = 15 = .55. Hence the 
critical region is p S .55. = 

b. If he obtains a value of p — .40 what decision should he make con- 
cerning the hypothesis? Ans. Reject it. 


Large Sample Tests Concerning Proportions - 67 


c. In making this decision, what risk does he take of making an error 
of the first kind? Ans. Risk is less than о = .041. 

4. In making this decision, what risk does he take of making an error 
of the second kind? Ans. None whatever, he has accepted no hypothesis 
at all so cannot possibly have accepted a false one. 

9. Answer the same four questions if the sample value had been 
p.70. 

a. Same answer. 

b. Accept hypothesis. 

c. No risk whatever of rejecting true hypothesis. 

d. There is a risk of error of second type but that risk depends on the 
alternative P4 and cannot be assessed from the data presented. 

10. Below are several statements of hypotheses, with critical region 
and level of significance for each. By reference to Table IV, verify the 
correspondence of significance level and critical region. In the column 
headed Decision, record A or В to indicate whether the hypothesis 
should be accepted or rejected. In the next column indicate the risk of 
error of the first type. In the final column, record О if there is no risk of 
error of the second type and record X if there is such risk. The magnitude 
of such risk cannot be computed from the given data. 


А S Risk of error 
b N H d a Observed Decision First Second 
d type уре 
1 25 P2.75 ps6 071 5 — — — 
"i ps3 А 5 
2 20 =.5 Е > т 114 A 
3 24 P2.25 р < 125 115 10 — — — 
T р > 75 20 па ИИ 
4 24 = .5 ie < 25 022 63 
5 22 Р 215 р=.50 .010 .67 == — — 


Large Sample Tests Concerning Proportions. The approxima- 
tion of the normal distribution to the binomial discussed in 
Chapter 2 can be used as a basis for large sample tests of hypo- 
theses on proportions. Use of the normal distribution for this 
purpose means that areas under the unit normal curve replace 
sums of terms in the tails of the distribution. For this purpose 
the statistic 


p-P 
(3.1) S y 
N 

introduced in Chapter 2 is treated as a normal deviate. 


To test a hypothesis about P on the basis of a sample value 
P, making use of the normal distribution, we first choose a level 


'68 - Inferences Concerning Proportions 


of significance о, say а = .05 or a= .01. Then we decide from 
the logic of the problem where the critical region should be lo- 
cated. The hypothesis is rejected whenever 


for a one-tailed test with right-tail critical, 2 > 21-a 
for a one-tailed test with left-tail critical, 2 < Za 
for a two-tailed test, Z < 2, OF 2 > 21-4. 


As an application consider the following problem: 

The proportion of all births which are male is known to be approxi- 
mately .51, a number which is generally held to be unrelated to any easily 
identified factor such as geographic location, race or economic status of 
parents. It has, however, sometimes been asserted that the ratio of male 
births tends to be higher in the years following a war or other great disaster. 
Suppose that in the 12-month period immediately following the close of 
World War II the birth records of a certain city showed 217 male births 
and only 184 female, the proportion of male births being thus 281 = .54 
instead of .51. Is this excess of male births over expectation to be inter- 
preted as evidence that some factor has been at work to increase the ratio? 


As phrased the question calls for a one-sided test of H : P = .51, 
since we wish to reject that hypothesis only if the proportion of 
male births is too high and not if it is too low. Suppose the level 
of significance is chosen to be а = .05. Then Z. = 1.645 and the 
region of rejection is 2 > 1.645. But the observed value of z is 


54 — .51 


dins edie 
го 


Аз 2 = 1.20 does not fall in the critical region, the hypothesis 
Р = .51 is accepted. The deviation from hypothetical value is no 
greater than may be expected to arise by chance as the result of 
random sampling. 

Confidence Limits for Proportions Computed from Large 
Samples. From Chart VI confidence limits can be obtained for se- 
lected values of N when the confidence coefficient is .95. How- 
ever, a chart cannot well accommodate lines for all sizes of №, 
and a separate chart is required for every size of confidence 
coefficient. Moreover, sometimes greater accuracy is required 
than can be obtained by graphic methods. It is desirable to have 
a method for obtaining confidence limits by formula. As before 
it must be assumed that N is not small and P is not near 0 or 1, 
so the normal approximation to the binomial may be used. 


Number of Cases Needed for Confidence Interval - 69 


То obtain the confidence limits we first select the confidence 
coefficient arbitrarily. The confidence limits are then derived 
from the large sample distribution of the sample proportion p. We 
know that for samples from a two-class population with given P 
the probability is 1 — о that 


(3.2) such Poe 
VN 


This inequality can be restated in the following form: 


£ 22 2° 
pap) 
LLLI 


24а 
(1+5) 


The second inequality is obtained from the first by algebraic 
manipulation which does not change the probability. These 
extremes are, therefore, confidence limits with confidence coeffi- 
cient 1 — a when р is the proportion observed in a sample. 

A good approximation to the inequality in Formula (3.3) is 
given by the inequality 


(3.4) TENE Р<р+а „М5 


Number of Cases Needed to Obtain а Confidence Interval 
of a Given Width. Suppose a polling organization has been asked 
to estimate the number of voters in a given state who are in favor 
of a particular proposition. The specifications are that the es- 
timate shall be made in the form of an interval 


C(A < P< В) = .99 


in which the width of the interval B — A = W is not greater than 
.05. By taking enough cases in their sample the agency can keep 
the interval to any predetermined value. Therefore an important 
part of their planning is the decision as to how many cases to use. 
Let us suppose that some preliminary information is available 


70 - Inferences Concerning Proportions 


from which they estimate that P will be somewhere near .75. 
They now have agreed that 


1-а = .99 
р = 75 
W = .05 


As only а fairly rough estimate of N is required, they may begin 
work with Formula (3.4) from which 


А = р + 24 ot and B=p' +a ot 


so that "T" 
W =B- Å = а. PV i Ed. 
= „М = – 224 >т 
(3.5) Therefore N = mn 
Substituting the given values yields 
ie eT cm = 1991 


On the basis of this preliminary analysis, the agency might de- 
cide to take a sample of 1991, or it might decide to ask the author- 
ities to allow it to work with a lower confidence coefficient which 
would not require so large a sample. 

Suppose a sample of 1991 cases is taken and the observed 
value of p found to be .70. As the sample is so large, Formula 
(3.4) may be used to compute the confidence interval. The ob- 
tained interval is then 


‚6785 < Р < .7265 


and W = .7265 — .6735 = .053 is very close to the width of interval 
which has been agreed upon as acceptable. The practical diffi- 
culty in this problem would be to obtain a random sample. If 
the sample is taken by some non-random method, say by the 
quota method often used by polling agencies, there is no assurance 
that p will have the probability distribution which is assumed in 
the formula employed. 

Sampling from a Finite Population. Some readers may have 
questioned whether the methods we have been using are аррго- 


Sampling from a Finite Population - 71 


priate if the population is finite. To illustrate with an extreme 
case, consider a population of 100 individuals of whom 60 possess 
a given characteristic, and 40 do not, so that P = .60. Now sup- 
pose samples of 99 individuals are drawn. It is intuitively clear 
that a sample of 99 individuals must be quite similar to the entire 
population of 100. Since such a sample must contain all of the 
population except one individual, there are 60 different samples 
which can be obtained by eliminating one individual which 
possesses the trait and 40 samples obtained by eliminating one 
which does not possess that trait. 


Composition of sample di Mio Tk be uj 
60 cases with trait, 39 without it 40 $9 — .6060 
59 cases with trait, 40 without it 60 $8 = .5959 

100 
_ 40(.6060) + 60(.5959) _ 
EG 40 + 60 =e 
40(.6060 — .60)? + 60(.5959 — .60)? 
Г Же = 
C= 40 4 60 .000024486 


This manner of computing c?, might be very laborious, but 
the result can be obtained by formula. If M represents the num- 
ber of cases in the population and N the number in the sample, 
then 

PQ M-N 
(3.6) LE N Mud 


In the situation for which we have already computed o°, directly, 
Р = .60, М = 100, М = 99, and therefore 


_ (.60)(.40) 100 — 99 
отит) 100 — 1 


i: 224. = 1000024487, 


which differs from the previously obtained value only in the final 
digit because of a rounding error. 
M-N Ы 
The formula E is a special case of Ро M ssi when М is so 


large that the fraction 


e —1 may be treated as unity. 


72 · Inferences Concerning Proportions 


Sometimes samples are drawn with replacement, each individual 
being returned to the supply before the next one is drawn. In 
such a case the sampling is in effect as from an infinite universe 


should not be used. If М is large and 


M-1 
M is considerably larger than N, the statistic 

APENDE ен 
(3.7) \/Р9.М-М 

N M-1 

has a sampling distribution which is approximately normal. This 
statistic can be used to test hypotheses about P as was done in 
the case of sampling from an infinitely large population using the 
statistic in formula (2.16) on page 37. 


The corresponding confidence interval for P with coefficient 
1 — о is (approximately) 


20 MEN pq M-N 
(3.8) PV Mai <P <P +2 м M-I 


If it is proposed to plan an experiment so that the confidence 
interval is to be of size W with confidence coefficient 1 — a then the 
number of cases needed is 


and the multiplier 


n AM2ysp'q 

— 4z^p'q! + (M — 1)W? 

For M very large these formulas differ very little from the earlier 
formulas where the population was taken to be infinite. 

‘Sample Size in Tests of Hypotheses. The important statistical 
problem of deciding upon the number of cases to be included in & 
sample has already been considered in relation to studies in which 
the objective is determination of à confidence interval for a propor- 
tion. A similar decision must, of course, be made when the objec- 
tive is a test of a hypothesis. In this type of investigation, sample 
size is determined by the necessity for minimizing the error of the 
second kind. 

The error of the first kind is controlled by selecting the level of 
significance a, which determines the risk of rejecting the hypothesis 
when it is true. The probability, 8, of accepting the hypothesis 
when some alternative is true is then controlled by the decision 4° 
to the size of the sample. 

Figure 3-6 shows the power curves for a test of H : P = .5 at 


(3.9) 


Sample Size in Tests of Hypotheses · 73 


significance level a = .11 made on samples of three different sizes 
(N = 10, N = 20, = 48). In the discrete binomial distribution 
only certain values of о can be realized. The three values of N 
used here were chosen because in each of the three binomial 
distributions with P =.5 ава М = 10, М = 20, and М = 48, it 
was possible to find a two-tailed critical region with approximately 
the same value of a, namely a = .11. 


Probability of rejecting H: P=.5 


0 pi 2 3 4 H 6 7 8 9 “10 
Scale of P 
Fic. 3-6. Power function for test of hypothesis Р = .5 
with a = .11 for samples of size № = 10, 20, and 48. 

If the hypothesis tested is true, all three samples entail ap- 
proximately the same probability of false rejection, that isa = .11. 
However if some alternative is true, such as P = .6, the larger 
sample provides much greater probability of rejecting the false 
hypothesis P = .5. 

It will be helpful to answer several questions by examination 
of Figure 3-6. 

(a) What is the probability of rejecting H : P = .5 when P is 
actually .6 if N=10? If N = 20? If М = 48? (Answer: .18, 
.26, .42) 

(b) Whatis the probability of rejecting H : P = .5 when P = .2 
if N = 10? If N =20? If N =48? (Answer: .68, .91, .999) 

(с) Suppose a statistician decides to accept the risk of rejecting 
the hypothesis P = .5 in 11% of samples if it is true and a risk of 
accepting the hypothesis in 30% of samples if some alternative is 
true, that is а = .11 and В = .30. If he uses a sample of 48 cases, 
how much will P have to differ from .5 in order that he may detect 
the discrepancy with the probability agreed upon? Solution: 
‘Figure 3-6 is drawn with < = .11, hence referring to that diagram 


74 - Inferences Concerning Proportions 


insures the chosen probability of type one error. If В = .30 the 
probability of rejecting must be 1 — .30 = .70. Draw a horizontal 
line through the point marked .7 on the vertical scale. This line 
intersects the curve for М = 48 at two points for which P = .35 
and Р = .65. Then for .35< P< .65 the test will have power 
less than .70 and therefore will have probability greater than .30 
of failing to reject H : Р = .50. If P is .65 or .35 the test will have 
power .70 of rejecting the hypothesis tested, and therefore the 
probability of type two error will be .30. If P > .65 or P< .35 
the power will be greater than .30 and so the probability of type 
two error will be less than .30. 

(d) Suppose a statistician decides upon significance level 
а = .11 and upon probability not greater than .20 of accepting 
H : P = .5 when P is actually .3. How large a sample should he 
take to insure these results? Solution: Draw a vertical line 
through P = .3 and a horizontal line through probability .8. 
(В=1-.8=.2.) These lines meet at a point between the curves 
N = 20 and М = 48 and nearer to the latter. Therefore, a sample 
of 20 cases will be inadequate while one of 48 cases will be some- 
what larger than is needed. 

To solve problems of this sort by reference to a chart would 
require a separate chart for each different value of а which might 
be chosen, a separate chart for each value of the hypothesis 
tested, and on that chart a curve for each of many different values 
of №, As such charts are not available, and as such problems 
are very real, it will be advantageous to develop another method 
of approach. 

Suppose that a State Superintendent of Instruction is con- 
cerned with the number of children in his state who hold super- 
stitious beliefs and with devising some means of reducing that 
number. He devises a measurement instrument, and trying it 
out in the ninth grade classes in a large number of schools, he finds 
that about 70% of the pupils return answers which would indicate 
they are “very superstitious.” A committee of teachers then 
develops teaching materials which they believe will reduce the 
prevalence of superstitious beliefs, but the teaching methods are 
somewhat time-consuming. Before they are put into general use 
it seems desirable to gain assurance that they really effect a re- 
duction in the proportion of superstitious pupils. 

Therefore it is planned that a new sample of ninth grade pupils 
shall be taught by the new materials and then given the same 


Sample Size in Tests of Hypotheses · 75 


test on which previously 70% of the former sample were rated 
“уегу superstitious.” The research worker conducting the study 
asks a statistician how large the new sample should be. To find 
a basis for answering that question certain requirements are de- 
cided on: (1) The hypothesis to be tested is to be that the new 
method is no better than the old one. This is formulated as 
P =.70. As this hypothesis is to be rejected only if p is smaller 
than .70, the critical region is in the left tail. (2) To limit the 
risk of adopting the new method if it is not superior to the old, they 
decide to adopt as significance level a = .05. (3) To limit the 
risk of not recognizing the superiority of the new method if such 
superiority exists, they decide that if P is really as small as .60 
or smaller, they are willing to take only a 10% risk of not dis- 
covering that superiority. That is В = .10 so the power of the 
test against the alternative P = .60 will be .90. 

Having agreed upon these requirements they can solve the 
question as to what size of N will be required. It must be assumed 
that the sample is large enough to treat p as normally distributed 


with mean P and standard error us - There are two distri- 


tributions to be considered, one distribution under the hypothesis 
Ри = .70 and another under the alternative Ра = .60. These two 
distributions are shown in Figure 3-7. 


P4770 


Fic. 3-7. The distribution of p under the hypothesis Ри = .7 and under 
the alternative P4 — .6. 
Critical region under H is baseline to left of C Y 
€ = .05 is represented by shaded area to left of C under curve specified by 
H:P = 70 
В = .10 = area to right of C under curve specified by alternative P = 6 
Power of test = .90 = area to left of C under curve specified by alternative P = .6 


Since о = .05, we have from the curve at the right 
С-.7 


gos = Van 


76 · Inferences Concerning Proportions 
and since В = .10, we have from the curve at the left 
C-.6 


fU Spe 


When the numerical values of 2.5 = — 1.645 and 2. = 1.282 are 
substituted in these expressions, each one is solved for C and the 
results are equated, we get the equation 


DUN 24 
Л = 1.645\/ = = .6 + 1282 77 


Solving this equation for N gives 


9. 2 
N (1415 21 + ee 8) 191 


as the estimated sample size necessary to meet the conditions 
agreed upon. 

To make the procedure general, let 
Ри = the proportion under the hypothesis to be tested 
P, = the proportion under the alternative 
a = significance level = probability of rejecting H when it is true 
В = probability of accepting H when the alternative is true 


Z-aV Ри@н + 21-вУ PaQa y 
Pg-Pa е 


(8.10) Then N = { 


EXERCISE 3.6 

1. In a random sample of 400 high school students in city A, 168 said 
in a test of general information that Iraq is the capital of Iran. Using & 
confidence coefficient of .99, make an interval estimate of the proportion 
of high school pupils in the entire city who would make the same mistake. 

2. A superintendent of schools has stated that at least 60% of high 
school seniors expect to attend college. In a random sample of 200 cases, 
only 96 say they are planning for college. Does this refute the superin- 
tendent’s statement? (Use a one-sided test with о = .02.) 

3. How large a sample will be needed to test the hypothesis that 
P < .30 at significance level .02 if it is agreed that the test shall have power 
.80 when P = .35? 

4. Answer the same question if it is agreed that the test shall have 
power .90 when P = .50. 

5. How large a sample will be needed to test the hypothesis P = .50 
at significance level .01 if it is agreed that the test shall have power .90 
when P = .52? 


Test That Two Proportions Are Equal - 77 


Test of Hypothesis That Two Population Proportions Are 
Equal When Each Is Estimated from a Large Number of Ob- 
servations and When the Estimated Common Proportion Is Not 
Near 0 or 1. In a public opinion poll taken in Peekskill, N.Y. 
in 1946, Hedlund * compared two methods of selecting a sample 
of persons to be interviewed on certain matters concerning the 
Peekskill publie schools. The Control Sample consisted of 97 
adults selected at random by interviewing one person at every 
38th address listed in the street guide of the latest Peekskill 
directory. The Experimental Sample consisted of 365 persons 
selected by high school students from among their adult aequaint- 
ances. The method of selecting individuals for the experimental 
sample is obviously more economieal. However, if the propor- 
tion of persons reported to hold a given opinion differs significantly 
from one sample to the other, the selection of the experimental 
sample must be presumed to involve bias. 

One of the questions asked by Hedlund was, ‘Would you be 
in favor of spending public money to add an evening school for 
adults to the services now offered by the Peekskill Public Schools?” 
The responses were obtained from populations of persons which 
may be the same or different in respect to their opinions on this 
issue. One may formulate the hypothesis that adult acquaint- 
ances of high school students have the same attitudes toward an 
evening school as the population of adults as a whole. 

There» аге two populations. One consists of the responses 
which might have been obtained by questioning all adults in 
Peekskill, and the other consists of responses which might have 
been obtained from all acquaintances of high school students in 
this city. Both are finite populations but large enough for sam- 
pling distribution theory based on an infinite population to apply. 
The mathematical description of the populations can be formu- 
lated as in the adjacent display. 


Class All adults All acquaintances 
Answering “Yes” Py Р, 
Answering “Мо” ог 

“No opinion” @ Q: 


The question to be asked about the two populations might be 
phrased: Is the proportion of “ Yes" responses the same in both? 
Mathematically the hypothesis can be formulated as P; = P; = P 
or as P, -Р,-0. A hypothesis which states that the difference 
between two parameters is zero is called the null hypothesis. 


78 · Inferences Concerning Proportions 


Level of significance and location of critical region must be 
decided before data are examined. Let us assume that a is to be 
05. The logic of the problem appears to call for a two-sided test, 
because we want to detect a difference in either direction. 

Let №, be the number of cases in the sample for which the 
observed proportion is p; and let № and р» be the corresponding 
values in the other sample. Then in the combined samples the 
proportion of “Yes” answers is 
bat Nip: + Мәр» 
~ Ny + Ne 
and the proportion not answering “Yes” is q= 1 — p. Let 
N = М, + №. The required statistic is 


(3.11) 


(3.12) z= Р Р 
(ес END paN 
№№ М, № 
As before, z denotes a variable with unit normal distribution. 
Formula (3.12) requires an unnecessary amount of arithmetic 
and risk of unnecessarily large rounding errors. 
Let n be the number of individuals answering “Yes” in the 


first sample and n be the number of individuals answering “Yes” 
in the second 


n = т +M 

N = Ni T № 
Then using the integers listed above instead of the decimals pı 
and p», Formula (3.12) can be reduced to 


(3.13) MAIL ku 


_  /NiNen(N — n) 
ae aN tay 


These two formulas will now be applied to the data from Hedlund’s 
study. Both the numbers and the proportions in the two classes 
are shown. 


Е 


Class Number Proportion 
Adults Acquaint- Both Adults Acquaint- Both 
ances ances 
Those answering “Yes” 62 241 303 .6392  .6598 .6558 
Those not answering “Yes” 35 124 159 .3608 3402  .3442 
Lt Lyre ardeo quus D en 


Entire group 97 365 462 1.0000 1.0000 1.0000 


Exercise · 79 


By Formula (3.12) 
.6392 — .6598 —.0206 


(6558) (3442)(402) -0534 
(97) (365) 


(124) (62) — (35) (241) _ _ 
y. (97) (365) (303) (159) 
462 


— .386 


By Formula (3.13) z= 


As the numerator and denominator of the first computation are 
carried to 3 significant digits only, an error must be expected 
in the third digit of the result. 

For large samples the statistic given in Formulas (3.12) and 
(3.13) has a distribution which is approximately normal. If 
neither №, nor № nor n nor N — n is small the normal approxima- 
tion will be satisfaetory. A correction to be used in small samples 
will be found in Chapter 4 on page 106, where a statistic closely 
related to this one is discussed. 

Since it was agreed that o should be .05 we read 2. from the 
normal probability table as 1.96 and 2.5 as — 1.96. The region 
of acceptance is therefore — 1.96 < z< 1.96. The observed value 

= — .389 falls well within this region. The observed difference 
in percentages is not large enough to throw any serious suspicion 
on the hypothesis P; — Р» = 0. Inasmuch as one method of tak- 
ing a sample is much easier than the other, and inasmuch as no 
evidence has been found to indicate that the methods produce a 
difference in proportions, the investigator is justified in using the 
easier method of drawing the sample. 


REVIEW EXERCISE 
1. The following terms have been introduced in this chapter. Be sure 
that the meaning of them is clear to you. 
acceptance value 
alternative to a hypothesis 


interval estimate 
level of significance 


confidence belt 
confidence coefficient 
confidence interval 
confidence limits 
consistent estimate 
critical region 

error of the first type 
error of the second type 
finite population 


null hypothesis 
one-sided hypothesis 
power function 
power of a test 
region of acceptance 
region of rejection 
region of significance 
rejection value 
sample point 


80 - Inferences Concerning Proportions 


size of a region two-sided hypothesis 

standard error unbiased estimate 

test 
2. Translate into words each of the following symbolic expressions: 
a H:P-3 d. P(p- 7|P = .6 and N = 40) =? 
b. E(p) = Р e. С(.43 < P < .49) = .98 
CH о = 7e 


8. For each of several situations there is given the hypothesis to be 
tested, the observed value of the statistic, the region of acceptance. Draw 
a circle around the letter A or В to indicate whether the decision should 
be to accept or reject the hypothesis. Draw a circle around the number I 
or II to indicate which type of error you are certain has not been made. 


3 Type of error 
Н p esed Decision which has not 
pi : been made 
Р = Л5 82 725 < р < 715 AR т 
P = 40 31 p < .50 AR тп 
Р = 50 „59 44 <р<.56 AR Tutk 
P > 80 78 p> ЛА AR р" 
Р = 42 41 40 < p < 44 A R "tm 
REFERENCES 


1. Clopper, C. J. and Pearson, E. S., “The Use of Confidence or Fiducial Limits 
Illustrated in the Case of the Binomial,” Biometrika, 26 (1984), 404-413. 

2. David, Е. N., Probability Theory for Statistical Methods, 1949, Cambridge Uni- 
versity Press. Chapter 1, Fundamental Ideas, Chapter 2, Preliminary Defini- 
tions and Theorems. Chapter 3, The Binomial Theorem in Probability. Chapter 
4, Evaluation of Binomial Probabilities. 

3. Eisenhart, C., Hastay, М. W. and Wallis, W. A., Selected Techniques of Statistical 
Analysis, New York, 1947, McGraw-Hill Book Company, Ine., 331-335. 

4. Hedlund, Paul A., Measuring Public Opinion on School Issues, unpublished 
Ed, D. thesis, 1947, on file in the Library of Teachers College, Columbia Uni- 
versity. 

5. National Bureau of Standards, Applied Mathematies Series 6, Tables of Binomial 
D eM Distribution, Washington, D.C., 1949, U.S. Government Printing 

се. 

6. Neyman, Jerzy, First Course in Probability and Statistics, New York, 1950, 
Henry Holt and Company. Chapter 2, Probability. 

7. Walker, H. M., Mathematics Essential for Elementary Statistics, New York, 
1951, Henry Holt and Company, 2d ed. Chapter 19, The Binomial Expansion. 

8. Wilks, S. S., Elementary Statistical Analysis, Princeton, 1949, Princeton Uni- 
versity Press. Chapter 10, Confidence Limits for Population Parameters. 
Chapter 11, Statistical Significance Tests. 


4 Chi-square 


The simplest type of population — and a very important 
type —is the dichotomy discussed in the preceding chapters. 
Many of the most fundamental concepts of statistical inference 
have already been developed in relation to this simple two-class 
population and its parameter P, and will now be applied to prob- 
lems from other types of ‘population. In this chapter we shall 
study populations composed of several discrete classes. In later 
chapters we shall deal with populations on one or more continuous 
variables. A greater variety of questions can be asked — and 
answered — about such populations. Samples from them will 
furnish new statistics and some of those statistics will have sam- 
pling distributions other than the binomial and the normal distribu- 
tion, so that new probability tables must be employed. However 
we shall still be concerned with such basic ideas as the sampling 
distribution of a statistic, its expectation and its standard error, 
critical region, significance level, two types of error, and tests of 
hypotheses. 

Populations Consisting of Several Discrete Classes. Such 
populations are very common. Again we shall introduce the topic 
with a problem in which there are so few alternatives that the 
probabilities can be obtained by direct enumeration. The illus- 
tration we shall use is a study by an advertising agency to deter- 
mine which of three color tones is most desirable for a certain 
display. In order to ascertain the reaction of the public, the 
display is prepared in each of the three colors and is shown to a 
sample of persons. Each person is asked to state which of the 
three displays he prefers. 

The statistician in charge of the study would probably select 
a large sample of persons. A procedure for testing hypotheses 
with large samples will be described later. However, in order 
to provide a background for that procedure a calculation will 
first be carried through on the assumption that only three judges 
are used in the study. 


82 - Chi-square 


The possible responses of the 3 judges may be classified in 
terms of the amount of agreement thus: 

1. All three judges may vote for the same display. For con- 
venience we shall call this type of response 3, 0, 0. Since the 
display for which they vote may be either I, IT, or III, there are 
8 ways in which they can make this type of response. 

2. Two judges may agree on one display and the third judge 
vote for a different one. For convenience we shall eall this type 
of response 2, 1, 0. It can be made in 18 different ways, which 
are tedious but not difficult to enumerate. A few of these 18 ways 
are listed here, the letters A, B, and C representing judges, and 
the numerals I, IT, and III the displays for which they vote 


Response I II III 
1 A — BC 
2 B AC — 
3 AC — B 


3. Each judge may select a different display. We shall call 
this type of response 1, 1, 1. It can be made in 3! = 6 different 
ways. 

The three types of response and their probabilities under the 
hypothesis that a randomly selected judge is as likely to vote for 
one display as for another, are therefore: 


Type of Number of ways in which Probability under 


response response can be made the hypothesis 
111 6 .222 
2,1,0 18 667 
3, 0,0 3 ып 
1,000 


Assume now that the three judges are chosen randomly and 
cast their votes independently and that all 3 vote for display II. 
Can the advertising agency be fairly sure that the public in general 
will prefer II? Could the agreement be a matter of chance? Let 
us test the hypothesis that in the population preferences are 
equally distributed so that a randomly selected judge is as likely 
to vote for one display as for another and consequently any of 
the 27 possible responses is as likely as any other. Then the 
probability that all three judges will select the same display is 
3/27 = .111. Since the highest agreement possible among 3 judges 
has so large a probability of occurring by chance in the absence 
of any real preference on the part of the general population, it 


Sampling Distribution of Response Types · 83 


must be concluded that 3 judges are not enough to reach any 
dependable conclusion. Let us therefore consider the same prob- 
lem with 6 judges. 

Sampling Distribution of Response Types. We found there 
were 3° = 27 possible ways in which the votes of 3 judges could 
be assigned to 3 displays, and that these 27 ways could be classified 
under the three types 3, 0, 0; 2, 1, 1; and 1, 1,1. With 6 judges 
there will be 3* = 729 possible ways in which their votes can be 
allocated to the three displays. 

Assume that the six judges are chosen randomly, and that of 
the six, four choose display I, one chooses display II and one 
display III. On the basis of these frequencies the statistieian 
wishes to test the hypothesis that in the population, from which 
the judges are a sample, the preferences are equally distributed. 

The 729 possible responses can be classified into the 7 types 
shown in the first column of Table 4.1. The number of different 


TABLE 41 The Number of Possible Responses of Six Judges Each of Whom 
Chooses any one of Three Displays Classified by Type of Response under the 
Hypothesis that One Display is as Likely to be Chosen as Another. 


T Number of Probability Cumulative 
ype responses in type of type probability 
2,2,2 90 „1235 1285 
3, 2,1 360 A938 6178 
3, 3,0 60 0823 ‚6996 
4,1,1 90 1235 8231 
4, 2,0 90 1235 M60 
5,1,0 36 мол 0900 
6,0,0 3 0041 1.0001 
отли 729 1.0001 


responses giving rise to each type is shown in the second column 
of that table. The student is not expected to verify these num- 
bers but may be interested in knowing how they were obtained. 
Consider the type, 3, 2, 1. Three out of 6 judges vote for one 


of the displays and these 3 judges can be selected in (5) = 20 ways. 
Two out of the remaining judges vote for another display and they 
can be chosen in () =3 ways. The remaining judge will then 


vote for the remaining display. The number of ways of grouping 
votes of the 6 judges is therefore 20 x 3 x 1 = 60. However the 


84. Chi-square 


votes of 3, 2, 1 can be assigned in 6 different, ways to the 3 dis- 
plays, and therefore the total number of possible responses which 
produce this type is 6 x 60 = 360. The other frequencies are 
arrived at in similar fashion, though it is not particularly impor- 
tant for the student to compute them. 

Under the hypothesis that out of all judges who could be con- 
sulted the same proportion would vote for each display, so that a 
randomly selected judge is as likely to vote for one display as for 
another, the expected type of the votes of 6 judges is 2, 2,2. In 
Table 4.1 the 7 types have been arranged according to the degree 
of their conformity to this hypothetical type. The first type 
shows complete conformity, the last one, 6, 0, 0, shows the greatest 
possible discrepancy. In the next section will be presented a 
statistic by which that discrepancy can be measured. Clearly 
one would be quite ready to reject the hypothesis if a sample 
yields the type 6, 0, 0, since the risk of error in falsely rejecting 
a true hypothesis would then be only .004. If the observed type 
were 5, 1, 0, the probability that this type or one showing even 
less agreement with the hypothetical frequencies should occur 
when the hypothesis is true is .049 + .004 = .053, and most in- 
vestigators would reject the hypothesis on the evidence of such 
data. 

In the problem as stated, the judges’ votes were 4,1,1. There 
are 3 types of response which disagree with the hypothesis more 
than this one and the sum of the probabilities for these three is 
177. The probability, under the hypothesis, of the observed 
pattern or something still more unlikely is 124 + .177 = .301. 
Clearly on the evidence from this response the advertising agency 
cannot assume display I to be more pleasing in general than dis- 
plays II and III. 

The calculation just described is quite laborious for six cases. 
In a more realistic situation, where many more than six cases 
would be studied, the calculation would become utterly forbidding. 
It is customary in such problems to compute a statistic which 
measures the discrepancy between observed and hypothetical 
frequencies and to study the sampling distribution of that sta- 
tistic. The statistic is known as x’, or chi-square. 

Chi-square as a Measure of Discrepancy between Observed 
and Expected Frequencies. For convenience in writing formulas 
we shall adopt the following symbolism: 

f; is the number, or frequency, observed in the ith class, 


Chi-square as a Measure of Discrepancy - 85 


F; is the number, or frequency, expected in the ith class in 
accordance with the proportions indicated by the hypothesis. 

The expected frequencies are computed on the assumption that 
the cases in the sample are apportioned according to the hy- 
pothesis so that the total expected frequency is the same as the 
total observed. The three displays define three classes over which 
the judges’ votes are distributed. The frequencies expected under 
the hypothesis are then F; = 2, F; = 2, and F; = 2, while the cor- 
responding observed frequencies are fi = 4, fo = 1, and f; = 1. 

The sum of all the differences f; — Ё; will always be 0 since 
ху; = УР. (The reader unfamiliar with the symbol Z should refer 
to page 111.) Therefore Z(f; — F;) cannot be used as a measure 
of discrepancy. If these differences are squared before summing, 
the negative signs will conveniently disappear. The size of 
(f; — Е.)° needs to be taken in relation to the expected value Fi, 
inasmuch as a difference of, say 5 when F; was, say 10 would be 
much more important than a difference of 5 when F; was, say 
100. (The formula was of course arrived at mathematieally and 
not by such intuitive argument as is advanced here.) Then the 


statistic 


a - FL TF (ә Ра)? 
ДИ УБ ue АИ 


or in general 
$ fi- д 
(4.1) х= > DATES 


measures discrepancy between observed and expected frequencies. 
The computation is then as follows: 


Class fi F; fi- Е: Gi- Fi? (fi — Е:)/Е; 
I 4 2 2 4 2.0 
II 1 2 £g 1 5 
ш 1 2 =< 1 5 
Sum 6 6 Si 3.0 = x? 


The Greek letter x? (or chi-square) was first used to describe 
this statistic by Karl Pearson in 1900.* Its use is practically 
universal. In fact it is one of the very few statistical symbols 
used by almost every writer. There is no parameter corresponding 
to this statistic. : . 

Chi-square is a discrete variable. It is never negative, since 
each term in the numerator is a square and each denominator is 
a positive number. If the observed frequencies should agree com- 


86 - Chi-square 


pletely with the hypothetical, x? would be zero. x? increases in 
size as the observed frequencies depart more and more from the 
hypothetical. 

The reader should compute x? for each of the response types 
in Table 4.1 in the manner shown above for type 4, 1, 1 and 
compare his results with the values shown in Table 4.2. 

Exact Probabilities of Chi-square Obtained by Enumeration. 
In order to decide what to do with the hypothesis we are testing 
it is necessary to set up a critical region. In order to set up a 
critical region for x? it is necessary to know its sampling distribu- 
tion. We already have, in Table 4.1, the probability distribution 
of response type under the hypothesis that all possible responses 
are equally likely. However, response type is not a statistic which 
has any general usefulness, whereas x? can be used in à very wide 
variety of situations to measure the discrepancy between observed 
and expected frequencies. We shall therefore convert the proba- 
bility distribution for response type of Table 4.1 into the exact 
sampling distribution for x?, merely by computing the value of 
x? corresponding to each response type. The two response types 
4, 1, 1 and 3, 3, 0 are then found to yield the same value of chi- 
square and so have been combined in Table 4.2. The probabilities 
in Table 4.2 are those already given in Table 4.1. The cumula- 
tive probability in the final column is the probability of obtaining 
by chance a value of chi-square which does not exceed the value 
listed in the x? column. 


TABLE 4.2 Values of X? Measuring Discrepancy between a Response Type and 
the Type 2, 2, 2, with Corresponding Probabilities and Cumulative Probabilities. 
(Probabilities as in Table 4.1.) 


ae Cumulative 
T 2 
ype Hu Erohability probability 
A 2,2,2 0 124 124 
В 3,2,1 1 494 617 
Со 3 
3, 3,0 3 } .206 828 
D 4,2,0 4 124 .94T 
Е 5,1,0 tf .049 .996 
F 6,0,0 12 .004 1.000 


Chi-square Curves and Table. To obviate the detailed and 
laborious computation required to determine the distribution of 
x? as indicated in Tables 4.1 and 4.2, a set of smooth curves is 


Chi-square Curves - 87 


Dy 


Probability Density 


HT СШ 277 
A Y ЧА 
0 1 2 3 4 5 6 7 8 9. 10. 31. 125 13 

Scale of x? 


Fia. 4-1. Exact and smooth x? distribution for samples of six cases 
grouped in three classes. 


available, and probabilities caleulated from areas under these 
curves may be used as approximations to the exact x? distribu- 
tions calculated by enumeration. The approximation is very 
accurate for large samples. Good results from the use of these 
curves may be obtained when the expected frequency in each 
class is at least five. Figure 4-1 shows the relationship between 
x? distribution as calculated by enumeration and the smooth curve. 
The exact probabilities in this figure are obtained from Table 4.1, 
and the smooth curve from the mathematical formula for the curve. 


Cumulative Probability 


0 1 2 2 4 5 6 70 8 9 1015119012 
Scale of X^ 


Еш. 4-2. Exact and smooth cumulative x? distribution for samples of 
six eases grouped in three classes. 


88 - Chi-square 


The relation of the smooth curve and the step curve can be 
seen more clearly in the cumulative distributions of Figure 4—2. 
The smooth curve gives a better approximation to the exact proba- 
bilities when М is large than when М is small. To illustrate this 
fact, a computation was made of the probabilities attaching to 
the various values of x? when 12 instead of 6 individuals are 
distributed in three classes. These computations are similar in 
nature to those reported in Tables 4.1 and 4.2. The figures will 
not be given here but the resulting cumulative probability values 
are shown graphically in Figure 4-3 with the same smooth x? curve 
which appears in Figure 4-2. The smooth curve depends upon the 
number of classes and not on the number of individuals in those 
classes. In Figure 4-2 the steps are larger and so the probabili- 
ties read from the smooth curve and the step curve show greater 
disagreement than in Figure 4-3. The smooth curve provides 
probabilities which approximate fairly well the exact probabili- 
ties of the step curve in Figure 4-3 for probability values of 
-90 or more, and it is in the range of cumulative probabilities 
from .90 to 1.00 that the accuracy of approximation becomes 
important. As N increases, the exact probabilities come in general 
closer and closer to the approximate values read from the smooth 
x? curve. 


Cumulative Probability 


0 1 2 3 4 5 6 7 8 9 10 11 12 
Scale of #2 


Fre. 4-3. Exact and smooth cumulative x? distributions for samples of twelve 
cases grouped in three classes. 


Chi-square Curves and Table · 89 


The burden of studying hypotheses in populations consisting 
of several classes can now be shifted to probabilities calculated, by 
use of the calculus, from the smooth x? curves. These proba- 
bilities are tabulated in Table VIII in the Appendix. 


Scale of Ж? 
отлов, он 12 14:215 25917, 


Probability Dens 


Uo А50 er Я Tuo Я 


Fic. 4-4. Smooth x? curve for 4 degrees of freedom, with selected 
percentiles indicated on the baseline. 


We have already seen that what is called “the binomial dis- 
tribution” is actually not one distribution but a whole family 
of distributions, the members of that family differing from each 
other as P and N change. The chi-square curve is also not a 
single curve but a family of curves. In Table VIII each horizon- 
tal row relates to one particular curve in this family of smooth 
curves. Subscripts of x? at the top of the table indicate per cents 
of area under the curve. The relation of these per cents of area 
to the tabular entries is illustrated in Figure 4-4 which shows 
the curve for the horizontal row n = 4. Select one of the columns 
in Table VIII, say the one headed x? ло, and read the tabular entry 
in the given row and column. It is 1.1. The relation between 
X? and 1.1 may be expressed in any one of the following ways, 
all of which have the same meaning, but with slightly different 
emphasis: 

10% of the area under this curve lies to the left of an 

ordinate at x? = 1.1 


90 - Chi-square 


POE < et) = ио 

P(x? > 1.1) = .90 

The 10th percentile of this distribution is x? = 1.1 
А. 


The rows of Table VIII are distinguished by entries in the 
column headed n. These values of n are known as degrees of 
freedom. Аз each row relates to one particular curve in the set 
of smooth curves, the form of the curve is seen to be determined 
by its degrees of freedom. 

Degrees of Freedom. The term “degrees of freedom” will 
occur again and again, not only in connection with chi-square 
but also in a great variety of other problems. It will be worth 
while to pause now for clarification of that concept. 

Suppose you are asked to write 3 numbers with no restrictions 
upon them. You have complete freedom of choice in regard to 
all 3. There are 3 degrees of freedom. 

Now suppose you are asked to write 3 numbers with the re- 
striction that their sum is to be some particular value, say 20. 
You cannot now choose all 3 freely, but as soon as 2 have been 
chosen the third is determined. Your choices are governed by 
the necessary relation X; + X: + Хз = 20. In this situation there 
are only 2 degrees of freedom. The number of variables is 3, 
but the number of restrictions upon them is 1, and the number 
of “free” variables, or independent choices, is 3 — 1 = 2. 

Now suppose you are asked to write 5 numbers such that their 
sum is 30 and also such that the sum of the first two is 18. There 
are 5 variables but you do not have freedom of choice with re- 
spect to all 5. You cannot write 5 numbers arbitrarily and have 
them conform to the 2 restrictions that X, + X» = 18 and 


Х, +Х, + Х + Х4 + X; = 30. 


As soon as you select Ху, then X; = 18 — X; and is completely 
determined. Since X; + X, + X; = 30 — 18 = 12, only two of the 
numbers, Хз, X, and Xs, can be freely chosen. As one of the 
numbers X; and X; can be freely chosen there are 3 free choices. 
The number of degrees of freedom is n = 5 — 2 — 3. 

In every statistical problem in which degrees of freedom are 
involved it is necessary to determine the number of “free varia- 
bles" by first noting the total number of variables and reducing 
that number by the number of independent restrictions upon 


Reading the Chi-square Table - 91 


them. In the preceding paragraph, for instance, one might think 
there are 3 restrictions, namely 

Х, + Х, = 18; Xs+ Xı+ Xs = 12; and Xi + X: + X; + X4 + Ху = 30. 
However only two of these are independent, since any one of them 
can be deduced from ihe other two. (For further discussion of 
degrees of freedom, see references 11 and 12.) 

Returning now to the problem of the advertising agency we 
note that the responses of six persons are distributed in three 
classes. Let Xi, X», and X; be the numbers in those classes. 
There are thus 3 variables. These are restricted by the relation 
X,+ X,+ Х = 6. Only two of the variables can be chosen freely ; 
two are free and the third is bound. (Do not be troubled by the 
fact that you are not free to choose a negative number, or a frac- 
tion, or a number larger than 6 in this problem. Even though 
choice of a variable can be made only from the digits 0, 1, 2, 3, 4, 
5 or 6 the variable is still considered free.) If instead of 6 judges 
there had been 1000 we should have had X; + X: + X; = 1000 
and the number of degrees of freedom would still be n = 3 — 1 = 2. 
Thus for all problems testing a hypothesis about the proportion 
of cases falling in each of 3 classes, when no restriction is laid 
down except that giving the total number of cases (here 6), the 
chi-square table may be entered with 2 degrees of freedom. In 
general, if there are k classes in such a problem there would be 
k — 1 degrees of freedom, regardless of how many cases are dis- 
tributed in the А classes. Other types of problem with a variety 
of numbers of restrictions and of degrees of freedom will be con- 
sidered in succeeding sections. 

Reading the Chi-square Table. In Figure 4-5 are shown five 
of the smooth x? curves for which probabilities are listed in Table 
VIII namely, the curves for which т = 1, 2, 4, 8, and 10. It may 
be helpful to consider the table and this figure together. Select 
one of the curves, say the one for which n = 3. Now find those 
entries in row n = 3 of Table VIII which may be interpreted sym- 
bolically to mean the following: 

P(x? < .35 | n = 8) = .05 

P(x? < 6.3 | n = 3) = .90 
Locate .35 and 6.3 on the baseline of Figure 4-5 and draw an 
ordinate at each point. Figure contains no curve for n = 3 to see 
that the proportion of area under the curve to the left of these 
ordinates appears to be approximately .05 and .90 respectively. 


92 - Chi-square 


Scale of x? 


Ета. 4-5. x? curves for 1, 2, 4, 8, and 10 degrees of freedom. 


It is very convenient to remember that E(x?) =n. This is 
almost the only guidepost which most people carry in their minds 
with respect to the x? distribution. It means that if one obtains 
an observed value of x? smaller than the number of degrees of 
freedom involved in a problem, one usually feels satisfied to decide 
at once, without consulting the probability table, that the discrep- 
ancy between hypothetical and observed frequencies is negligible. 


EXERCISE 4.1 
1. Verify the following statements by reference to Table VIII and 
translate them into words: 
Р(х «99|n = 21) = 02 
. P(x? < 43:2| n = 27) = 975 
Р(№ <2.7|n =1) =.90 
. P(x? > 30.6 |n = 15) = .01 
PG? > 15 |n = 6) =.0 
P(x? > 8 |п = 16) =. 
P(x? > 40.1 |n = 27) = .05 
. Р(14.6 < x? < 37.7 |n = 25) = .90 
Р(74 <x? < 40.0 |n = 20) = .99 
P(85 < x? < 223 [п = 15) = .80 
01 < P(x? < 14|n = 28) < .02 
05 < P(x? > 28|n=19) < .10 
. 005 < P(x? > 42 |n = 22) < 01 
. .01 < P(x? > 42|n = 24) < .02 
025 < P(x? > 20 |n = 11) < .05 
Р(х < 3.2 |n = 13) < .005 
P(x? > 32 |п = 10) < .001 


с № 


SPOR RM ror rm но ро ср 


Observed and Theoretical Frequencies · 93 


2. Suppose you have decided that whenever the observed x? is equal 
to or greater than x?.s you will reject whatever hypothesis you are testing 
and whenever it is less than x?.ss you will accept that hypothesis. For each 
of the situations described below, read the appropriate value of Х2.э5 from 
the table and circle A or В to indicate whether the hypothesis under 
consideration should be accepted or rejected, as illustrated in a. 


% 2 hint О. ЖО 
а 2, 87 eee Ome БОБ HIS feet АЕ 
b. 7, ^48 иес 253059 40:2 oves УВ 
с. T ЛЕЗИНА Е. Bio O71 Y КАЗИ 
d. Лу 108) (AeA, о © TU ART 
e. 15, «46g ПАВ ео би асре ENS 


3. Carry out the instructions for the preceding questions but using the 
:01 significance level (that is, refer to Хю) instead of the .05. Compare 
the decisions as to disposition of the hypothesis reached in the two situa- 
tions. 


Comparison of Observed and Theoretical Frequencies by 
Means of x2. The problem of testing a hypothesis regarding the 
proportion in the classes of a population has been discussed at 
length for a small sample. Now that the x? table is available, 
problems involving a larger number of cases can be considered. 

Suppose that in the study of preferences for advertising dis- 
plays 60 cases have been used instead of 6. The hypothesis to 
be tested is still that the proportions in the population are equal. 
Then the 60 cases are expected to be distributed equally in their 
preference. Hence F, = F = Ёз = 20. Suppose that the observed 
frequencies are 30, 18, 12; so that Л = 30, f; = 18, Л = 12. Sub- 
stituting these values into Formula (4.1) we compute the value of 
chi-square as 


_ (80-20) , (18-20)? , (12-20)? _ 
mw eee prm: oma 


The x? table shows that this value leads to rejection of the hy- 
pothesis at both the .05 and .02 levels, but to its acceptance at 
the .01 level. 

Problems leading to a comparison of observed with theoretical 
frequencies arise when one wishes to determine whether an ob- 
served sample has been drawn from a population having some 
theoretical form such as the normal. This problem will be taken 


up in Chapter 5. 


x 


94 - Chi-square 


EXERCISE 4.2 

1. Several situations will be described briefly. No computations are 
to be made, but in each the task is to decide upon a number denoted n 
which is the number of degrees of freedom and upon another number 
denoted N which is the number of observations. 

(a) In a college class of 10 students, each student is asked to appraise 
a text as either excellent (E), good (G), mediocre (M), or poor (P), and 
their appraisals are to be used to test the hypothesis that college students 
in general would show judgments evenly divided in the 4 categories. For 
text A the appraisals were Е, 0; G, 5; М, 4; P, 1. Then n=—, 
N = —. 

(b) Let the situation be the same as in question (a) except that there 
are 50 students whose pb feral of text A are E, 3; G, 31; M, 12; P, 4. 
Then n = —, N = 

2. If the eat and the smooth x? distributions were fitted to the data 
of questions (a) and (b) in the preceding example, do you think the fit of 
the smooth curve would be better in one situation than in the other? 
If so, in which? 

3. In a toss of 4 pennies the distribution of number of heads falling 
upward has the following probabilities, obtainable from the binomial 
distribution: 


Number of heads 0 1 2 3 4 
Probability is i i + ds 


Using these probabilities, what are the expected frequencies in the five 
classes, if the throw of four pennies is repeated 64 times? 
What is x? if the observed frequencies in 64 throws are 


Number of heads 0 1 2 3 4 
Observed frequency 2 12 23 20 7 


Does the value of x? warrant a judgment that one or more of the 
pennies are biased? 

4. Answer question 3 if the pennies were tossed 128 times and observed 
frequencies of heads were 4, 24, 46, 40, and 14. The problem here is to 
note what happens to x? when every f; and every F; is doubled. 

5. Answer question 3 if the pennies were tossed 64r times and the 
observed frequencies of heads were 2r, 12r, 23r, 20r, and 7r. Make a 
generalization as to effect on x? of multiplying observed and theoretical 
frequencies by r, keeping proportions unchanged. 


Computing Chi-square from Data in Form of Per Cents. 
When data are given in per cents rather than frequencies it is 
convenient to compute x? by the formula 


zs (пес Ра)? 
(4.2) x= NEP 


Tests of Independence in Contingency Tables - 95 


where p; is the observed proportion in the ith class and P; is the 
expected proportion. If only per cents are given and the total 
sample size N is unknown x? cannot be computed. To use per 
cents as though they were frequencies would be to proceed as 
though N were 100. If N is less than 100 this would give too 
large a value of x? and might lead to false rejection of a true 
hypothesis more often than is indicated by the level of significance. 
If М is greater than 100, x? would be underestimated and the hy- 
pothesis accepted more often than should be the case. 

Tests of Independence in Contingency Tables. The fore- 
going discussion has dealt with tests of hypotheses that the popu- 
lation proportions in several classes are in agreement with certain 
hypothetical proportions. We shall now consider situations in 
which individuals are classified according to two discrete varia- 
bles, and the problem deals with the relationship between those 
variables. 

As an example of such a situation consider data taken from 
Rope’s study of Opinion Conflict and School Support.* Each of 
1464 adult residents of Pittsburgh, Pa., was interviewed on a 
variety of issues related to tax support. One of the questions 
asked was, “Do you think tax money should, or should not, be 
spent on nursery schools for children less than four and a half 
years old?” Responses were classified as (1) favorable, (2) no 
opinion, and (3) unfavorable. The problem was to determine 
whether there is a relationship between type of response and age 
of respondent. Each individual was then classified both according 
to type of response and according to age. The resulting distribu- 
tion appears in Table 4.3. 


TABLE 4.3 Frequency of Observed Response to the Question of Spending Tax 
Money for Nursery Schools, Classified According to Nature of Response and Age 
of Respondent.* 


Group aged Group aged Group Total 

20-34 34-54 over 54 
Favorable response 153 182 65 400 
No opinion 35 50 25 110 
Unfavorable response 377 417 160 954 
TOTAL 565 649 250 1464 


* Taken from Rope, Opinion Conflict and School Support. 
A two-way distribution like that in Table 4.3 in which the 
categories are discrete is called a contingency table. To study the 


96 - Chi-square 


problem of relationship between the variables, the hypothesis of 
independence is formulated. This hypothesis is then tested by 
an application of the x? method. 

The methods of dealing with the hypothesis of independence 
will be developed in its most general form for a contingency table 
with more than two classes in each of the variables. While these 
methods are applicable to all two-variable contingency tables 
simplifications of the computational techniques are available for 
contingency tables in which (1) one of the variables has two 
classes and the other more than two classes, and (2) each of the 
variables has two classes 

a. More than two classes for each variable. This situation is 
illustrated by the data in Table 4.3. If the hypothesis of inde- 
pendence is true, then, in the population, the proportions in any 
one response row are equal to those in every other response row, 
and the proportions in any one age column are equal to the pro- 
portions in every other age column. Therefore, the best estimates 
of the population proportions in any row are the proportions 
indicated by column totals and the best estimates of the propor- 
tions in any column are the proportions indicated by the row 
totals. Either row totals or column totals can be used to obtain 
expected values for a x? test. 

If we use row totals then for any of the age groups the popu- 
lation proportions are as follows: 


Favorable: 400/1464 
No opinion: 110/1464 
Unfavorable: 954/1464 


The expected frequencies for any age group can be computed 
by apportioning all the individuals in the age group according 
to these proportions. Thus for the 20-34 age group the expected 
frequencies are 


Favorable:  (400/1464)(565) = 154.4 
No opinions: (110/1464)(565) = 42.4 
Unfavorable: (954/1464) (565) = 368.2 

565.0 


By use of the same proportions the individuals in the remain- 
ing age groups may be apportioned similarly according to response 
pattern. The expected values resulting from these computations 
are displayed in Table 4.4. 


Tests of Independence in Contingency Tables · 97 


TABLE 4.4 Expected Frequency of Response to the Question of Spending Tax 
Money for Nursery Schools, on the Hypothesis that Age and Response are Inde- 
pendent. (Data from Table 4.3.) 


Group aged Group aged Group 
20-34 35-54 ssi 
Favorable response 154.4 177.3 68.3 400 
No opinion 42.4 48.8 18.8 110 
Unfavorable response 368.2 422.9 162.9 954 
Toran 565.0 649.0 250.0 1464 


To write formulas for x? the following notation for the observed 
and expected values will be adopted: 


Observed Frequencies Expected Frequencies 

fu fe fis Fu Fi Fis 
Лл fa fos Fa Ёз» Ез» 
№ fn fas Ёз Fn Ёз 


The formula for x? can now be written as 
t oe ны ДР 
ж = ХХ WR xS 

From the values in Tables 4.4 and 4.3, x? can be calculated by 


Formula (4.1). However, a simpler computational routine is 
provided by Formula (4.3). 


(4.3) х= хх М 
Lp 
If data are in the form of proportions this formula becomes 
Dij 
4.4 '- v(zz 5 -1) 
(4.4) x Ру 
Direct substitution in Formula (4.3) is carried out as follows: 
(158)? , (85)? SN (25); , (160)? Med rat 
о са 


The computed value of x? = 4.1 measures the discrepancy be- 
tween the set of observed frequencies and the set of theoretical 
frequencies. But is that discrepancy extreme? Is the distribu- 
tion of observed frequencies unusual in view of the hypothesis? 
The number of possible patterns in which the 1464 cases could 
be distributed in 9 classes is enormous. To compute the exact 
distribution and obtain from it the probability of a frequency 


98 - Chi-square 


distribution at least as exceptional as the one observed, as was 
done for a small sample in the first of this chapter, would be a 
very large task. The smooth x? curve with appropriate number 
of degrees of freedom will provide a close approximation to the 
exact probability. 

The number of degrees of freedom is the number of classes 
less the number of restrictions. When there are r rows and c 
columns the number of classes is re. There is one restriction for 
each of the r rows, namely 


Јаја +/в= Еа + Ра +. + Fi 


This equation means that for any one row, there is а free choice 
of frequency for all cells but one (that is for с — 1 cells) and the 
frequency of that cell must be whatever is needed to make the 
sum of the f; for the row equal to the sum of the F; for that row. 
Similarly there is one restriction for each of the c columns, and 
in that column there is free choice of frequency for all cells but 
one (that is for r — 1 cells). However there are not r + c different 
restrictions but only г + с — 1, for any one of them can be obtained 
from the others. 

The number of classes minus the number of restrictions is 
therefore rc — (r +c -— 1) =rc—r—ce+1=(r—1)(e-1). Also 
the number of degrees of freedom is the number of classes for 
which there is a free choice of frequency. In any one row there 
is such free choice for c — 1 classes. If c — 1 frequencies are filled 
arbitrarily in r — 1 of the rows, the frequencies for all classes in 
the remaining row will be foreordained. Therefore the number of 
classes for which there is free choice of frequency is 


(4.5) n = (r — 1)(c — 1) 


which is the number of degrees of freedom in problems of this 
type. 

For this problem n = 4 and x? = 4.1. In the x? table the entry 
which is next larger than the observed value is x.z? = 5.4. If all 
possible samples of the given size were drawn from a popula- 
tion with the given theoretical frequencies, over 25% of those 
samples would have frequency distributions for which x? was 
greater than the observed x?. This sample is indeed not excep- 
tional under the hypothesis. It is appropriate to assume that 
the three age groups are homogeneous with respect to opinion 
on the use of tax money for nursery school. It would not be 
appropriate to seek for a possible explanation as to why the per 


~ 


Tests of Independence in Contingency Tables · 99 


cent of favorable response was larger (28.1%) for the group aged 
35-54 than for the older group (26.0%) inasmuch as it has been 
shown that chance provides a reasonable explanation for the ob- 
served differences. 

b. Two classes for one variable, more than two for the other. This 
type of problem may also be illustrated with data from Rope’s 
Opinion Conflict and School Support. In his sample of 1464 per- 
sons, 707 were men and 757 women. To the question “Some say 
that in future years the only way the schools can keep up the 
services they are giving today is to increase taxes. If this is true, 
should school services be cut or taxes increased?” they expressed 
opinions distributed as in Table 4.5. Are these data consistent 
with the hypothesis that the per cent holding opinions in each of 
the three categories is unrelated to the sex of respondents? 


TABLE 4.5 Frequency of Observed Response to the Question of Increasing 
Taxes to Support Current School Services, Classified by Type of Response and 
Sex of Respondent.* 


Men Women Total 

Approve increasing taxes 296 236 532 
Have no opinion 183 295 478 
Do not approve increasing taxes 228 226 454 
Toran 707 757 1464 


* Data taken from Rope, Opinion Conflict and School Support. 


In this type of problem x? can be computed exactly as it was 
computed in the preceding section, but there is available an equiv- 
alent routine which involves less numerical work. Let a; and b; 
be frequencies in the two groups in the 7th class and n; = а; + b; 


k k LA 
Let Уа=А, L b; = B, Dm =N 
i=l i= i= 


Then A + В = М. There are 2k classes. The number of degrees 
of freedom is n =k — 1. The usual formula for x° сап be shown 
to reduce to a special formula, valid only for the case described 
in the heading of this paragraph or described alternately as the 
study of the independence of two traits when one of them is à 
dichotomous trait. 


k az A? k b? B* 
д km N Am N 
e aay NUR Nap Em 


100 - Chi-square 


The numerical computations for the data of Table 4.5 are shown in 
Table 4.6. The student should verify them by carrying through 
the same procedure using the b; frequencies instead of the a;. 
He may also wish to use Formula (4.3) to convince himself that 
the method of Formula (4.6) is algebraically equivalent to the 
general method. 

With an observed x? of 31.3 when х? оо is only 13.8, it is clear 
that men and women cannot be presumed to hold similar opinions 
concerning the issue under discussion. 


Table 4.6 Computation of x? from the Data of Table 4.5. 


a a; AES 
а; Ni "i а; iG М = 1464 = 48292 
296 532 55639 164.69 z = a = 51708 
183 478 38084 170.06 АВ 
228 454 50220 1450 „ WN 
Sum A = 707 N= 1464 349.25 = 251 д, 
Bis п аы 
7.82 7.82 
Д дъ = E 
х= 2497 = 313 


n= (3—1)(2—1) =2 


€. Two classes for each variable. From the data described in 
the preceding paragraph there might be made a test of the hy- 
pothesis that the per cent of persons undecided is the same among 
men as among women. Тһе observed per cents holding “по 
opinion" are 25.88% for men and 38.97% for women. The fre- 
quency distribution presented as a double dichotomy is shown in 
Table 4.7. The two computing routines previously described are 
each applicable here and the student may gain practice by apply- 
ing them. However the method about to be described will produce 
the same outcome with less computation. 


TABLE 4.7 Frequency of Observed Response to the Question of Increasing 
Taxes to Support Current School Services, Classified by Indecision and Sex of 
Respondent.* 


——————————— 


Men Women "Total 

Holding definite opinion 524 462 986 
Holding no opinion 183 295 478 
TOTAL 707 757 1464 


* Data taken from Rope, Opinion Conflict and School Support. 


Relation of x’ to the Statistic of Formula (3.12) - 101 
The four frequencies will be denoted a, b, d and d, with marginal 
frequencies a + b, c + d, a + с, and b -- d, М - ac b c d. 


a b a+b 
с а c+d 
a+c b+d N 


The number of degrees of freedom is n = 1. 
The general formula for x? reduces in this case to the special 
formula 


Hes (ad — bc)*N 
(4.7) X “(т be + d)(a + c) (b 4- d) 
Here ad — bc = (524)(295) — (462) (183) = 70034 and 
1 (70034)?(1464) — og 5 


Х = (986) (478) (707) (757) 


whereas for 1 degree of freedom x? is only 10.8. 

The hypothesis of equal per cents of undecided persons in the 
two populations is untenable in face of such data. 

Relation of x? to the Statistic of Formula (3.12). The reader 
will recall that in Chapter 3, page 78, a problem very similar to 
the one of the preceding paragraph was solved by obtaining the 
difference between observed per cents in two independent samples, 
dividing that difference by its estimated standard deviation and 
referring the result to a table of the normal probability curve. It 
may now be stated that the two methods lead to the same result. 

For a large sample if x? has 1 degree of freedom, м has a dis- 
tribution which is the right-hand half of a normal distribution. The 
general proof of the preceding statement involves calculus and so 
will not be presented here. However the algebraic equivalence of 


sel (ad — bc)*N EUN pr 
X = (a 5y(a + c) (c + d)(b 4- d) аф ттш. 


is very easily established by sheer algebraic manipulation, using 
either the substitution 
№: =а+с № =6+а М = М, + № 
а b _a+b 
pi N, рә = М, PAN 


102 - Chi-square 


or the substitution 


Ni=a+b Nz=c+d М = М, + М, 
M TEC _а+с 
DUNT PN; pe N 


The equivalence of the two methods for the data of Table 4.7 
may be seen by computation as follows: 


Let p= fier = .6785 and q=1- p= #75 = .3265 
Ра — р)(№ + N») _ (.3265) (.6735) (1464) _ 


№№ (707)(757) = 0006015 
Di = 64 = .7412 
pa = $82 = .6103 
= 2 
= _ (2588 — 3897) _ 0171348 _ og 49 


0006015 _ 40006015 


which agrees with the value previously given for x2. 

Inasmuch as the normal distribution is tabulated for finer 
intervals of the argument than the chi-square distribution, it is 
a satisfactory procedure to compute x? by Formula (4.7), to 
obtain z by taking the square root of x?, and to refer the result 
to a table of normal probability. If the logic of the problem seems 
to relate to the difference of two per cents, the outcome may be 
discussed as a test of the hypothesis P, = P}. If the logic seems 
to relate to the independence of two variables the outcome may 
be discussed in those terms. 

Comparison of Two Proportions Based on the Same Indi- 
viduals. An interesting type of problem arises when the same 
individuals are measured on two different occasions and a com- 
parison of the per cents registering a given characteristic on the 
two occasions is required. Suppose, for example, that in an 
elementary school a test of racial prejudice is administered to 
200 pupils of whom 78 return highly prejudiced answers. A movie 
designed to reduce prejudice is shown them a week or so later, 
and after another day the same test is readministered, this time 
with only 52 pupils showing highly prejudiced answers. None of 
the methods previously described in this chapter can be used, 
for in all of them the per cents to be compared were based on 
different individuals. 


Let 0+4 = number of children Showing prejudice on first test 
€ +d = number of children showing prejudice on second test 


Very Small Samples - 103 


Then b = number showing prejudice on first but not on second 
and c = number showing prejudice on second but not on first 


We are interested in the hypothesis that in the population the 
proportion showing prejudice only on the first test is equal to the 
proportion showing it only on the second. This hypothesis is 
tested by computing 


ОЕ 


(4.8) НЕЕ with 1 degree of freedom 


It is therefore necessary to have not only the two figures b + d = 78 
and c + d = 52 but also to have at least one of the four cell entries. 
Suppose we know that 30 pupils showed prejudice on both tests, 
во d = 30. 


Then $ = 78 — 30 = 48 
c = 52 — 30 = 22 
(48 — 22)? 676 
Rie А VS 
~ 48 + 22 70 90 


z = М = V9.66 = 3.1 


The reduction from pi = zi = .39 to p» = zig = -26 is highly sig- 
nificant and cannot be attributed to chance. 

The Two-by-two Contingency Table in Very Small Samples. 
That a smooth curve does not provide a very close approximation 
to exact probabilities when samples are quite small has been noted 
in Chapter 3 and in the early part of this chapter. In Chapter 3 
the normal curve was seen to provide a good approximation to 
the binomial for large values of N but not for small. In this 
chapter the smooth chi-square curve was seen to provide a good 
approximation to the exact x? probabilities for large N and not 
for small. This situation raises the question as to how to deal 
with problems similar to those discussed in the preceding section 
when the samples are quite small. The question of what is meant 
by “quite small" will be disregarded for the moment but dis- 
cussed later on. 

In his study of Adolescent Fantasy,” Symonds asked adoles- 
cents to write a story about each of a set of pictures he presented 
to them. Each story was then read and categorized as using or 
not using a particular theme such as violence, sex, bad compan- 
ions, illness, ete. One of the questions in which he was interested 
was the possible discovery of a sex difference with respect to the 
use of a particular theme, Each child was classified as using the 


104 - Chi-square 


theme if it occurred in at least two of his stories and as not using 
it if it occurred in one or none of his stories. 

Symonds used the same number of boys and girls. In order 
that the student may see the method in a more general form with- 
out the special condition introduced by keeping the size of the 
two samples equal, the original data will be changed slightly. 
Suppose that 16 boys and 18 girls have been included in such an 
experiment, and that of the boys 7 have used the theme of violence 
while of the girls only 1 has used it. The corresponding observed 
proportions are p, = 15 = .4375 and p, = їх = .0556. Is the dif- 
ference between these proportions compatible with the hypothesis 
that in the population the proportions are equal, P, = P, = P? 

It is convenient to present the observed results in the form of 
a contingency table, as follows: 


Boys Girls Total 
T 1 


Using the theme of violence 
Not using the theme of violence 
Toran 16 18 34 
We shall now consider the hypothesis that sex is unrelated to 


the tendency to use a theme of violence and shall assume that all 
samples have the same marginal totals: 


Boys Girls 
Using theme a b 8 
Not using theme c | d | 26 
16 18 34 


Under this restriction there are 9 sets of possible values which 
can be taken by the cell frequencies a, b, c and d, and these are 
listed as arrangements A to I in Table 4.8. The probability of 
any one arrangement is given by formula 


(4.9) (a 4- b) Ke + d)Y(a + c) (b +d)! 
Nlalbleld! 
The probability for each of the 9 possible arrangements has 
been worked out in Table 4.8. 
The probability distribution of Table 4.8 might be represented 
graphically in the manner of Figures 3-1 and 3-2. Now suppose 
.05 has been selected as the level of significance and the critical 


Very Small Samples · 105 


TABLE 4.8 All Possible Arrangements of Cell Frequencies in a Two-by-Two 
Table with Fixed Marginal Frequencies, and the Probability Associated with Each 
Arrangement. 


аиан Probability 
4 à = с ТЕ ЕГЫП ТЕПЕ ~ И 
2 à = ТЕТ 7 3006678 7 101134 
Iu cT ama 
р ў 1: ЧИГ JORE поро 
к ин  фыплиш їй" 3007 
ғ 1210 бит бав 297 
б ammen дав 22270 
н п Кит“ ioosers~ 9и 

зы -EER oae 


region is to be so chosen that it will subtend .025 of the area in 
each tail. Then it is clear that arrangements A and В with proba- 
bilities .00071 + .01134 = .01205 fall in the critical region at one 
end of the distribution and that arrangement J falls into the 
critical region at the other. Any one of these three arrangements 
would be held incompatible with the hypothesis; none of the 
other six would be considered as throwing suspicion on that hy- 
pothesis. Since the observed data have shown arrangement B, 
the data present evidence of a sex difference with boys tending to 
use the theme of violence oftener than girls. 

Because of the arithmetic involved the method just described 
can be recommended only for very small samples. Certain aids 
are available. A table of the logarithms of factorials is to be 
found in Statistical Tables * by Fisher and Yates. 

In small samples the usual computation of x? gives too large 
a value, leading to rejection of the hypothesis more often than 
would the direct computation of probability by factorials. This 
error can be offset by a procedure commonly known as Yates’ 


106 - Chi-square 


correction ^ * 9. Тһе procedure is to change the frequency in 
each cell by .5, keeping the marginal totals unchanged, and re- 


ducing the size of x°. Thus observed frequencies 2 | 10 with 


5.5 | 9.5 
6.5 | 2.5 
effect can be produced more easily without rewriting the frequen- 
cies by subtracting 4N from the absolute value of ad — be. Thus 
in the illustration N = 24 and ad — bc = 10 — 70 = — 60, and 
| ad — be | — &N = 60 — 12 = 48. The vertical bars around а num- 
ber indicates its numerical value without regard to sign. 

If x,? represents a value of x? adjusted by Yates’ correction, 
then 

2... ([ad be] - N/2)2N 

о X 7 (3-5) (a3- c) (b - d) (c 4 d) 
Verify that applying Formula (4.10) to the frequencies on the left 
gives the same result as applying the usual x? formula to the 
adjusted frequencies on the right. 


X? = 4.4 would be changed to with x? = 2.8. The same 


10 | 4 9.5 | 4.5 
озая 2.5 | 3.5 
p 214 5.5 | 3.5 
"15| 2 14.5 | 2.5 
, 2016 19.5 | 6.5 
ТЕТЕ 5.5 | 145 


When the expected frequency in every cell is large, applica- 
tion of Yates’ correction will have a negligible effect on х?. The 
following working rule is suggested: (1) If the significance level 
obtained from x? is larger than the predetermined significance 
level, the hypothesis may be retained. (2) If the significance 
level obtained from x? is smaller than the predetermined signif- 
icance level, the “corrected” х,? should be computed. If the 
probability obtained from it is also smaller than the significance 
level, the hypothesis may be rejected. (8) If decisions based on 
х? and on “corrected” x,? are in disagreement, the exact proba- 
bility should be calculated by the method described on page 103. 

Use of the Chi-square Approximation when Expectations Are 
Small. Use of the table of areas under the chi-square curve as an 
approximation to the discrete chi-square distribution is justified 
mathematically when the number of cases in the sample is large. 


Chi-square when Expectations Are Small - 107 


The research worker, who needs some exact definition of the word 
“large” in this situation, can find no simple answer to his ques- 
tion 2 * 718, The discussion in this section is directed toward 
finding practical rules that are reasonable and clear. 

The word “large” relates to the number of cases expected (not 
the number observed) in a cell. Theory requires that all expecta- 
tions shall be “large.” It is commonly stated that the chi-square 
curve provides an adequate approximation when the least expec- 
tation in any cell is five. This requirement seems excessively high. 

Figure 4-3, which is based on samples of only 12 cases in three 
cells, the expected value per cell under the hypothesis tested being 
only 4, shows fairly close approximation of the values computed 
by the chi-square curve to the exact chi-square probabilities. 
Similar results were obtained by Neyman and Pearson on the 
basis of ten cases in three cells. Cochran ‘ discusses the situation 
when the expected number of cases is small for one of the cells 
and large for all the remaining cells and the number of degrees 
of freedom is two or more. In this situation the x? table will 
provide a good fit even if the expectation in the one cell is as small 
аз one. 

The following practical rules of thumb are suggested for test- 
ing significance by use of the tables of areas under the chi-square 
curve. Р 

1. If there is only 1 degree of freedom, follow the suggestion 
previously given for the use of Yates’ correction. 

2. If there are 2 or more degrees of freedom and the expecta- 
tion in each cell is more than 5, the chi-square table assures a good 
approximation to the exact probabilities. 

3. If there are 2 or more degrees of freedom and roughly 
approximate probabilities are acceptable for the test of signif- 
icance, an expectation of only 2 in a cell is sufficient. 

4. If there are more than 2 degrees of freedom and the ex- 
pectation in all the cells but one is 5 or more, then an expectation 
of only one in the remaining cell is sufficient to provide a fair 
approximation to the exact probabilities. 

5. If the logic of the problem permits, combine some of the 
classes to increase the expectations in the cells when several cells 
have very small expectations. 

Summary. Three new ideas are presented in this chapter, 
(1) the idea of x2 as a measure of discrepancy between a set of 
observed frequencies and the corresponding frequencies expected 


108 - Chi-square 


under some hypothesis, (2) the idea of a contingency table, and 
(3) the idea of degrees of freedom. Several methods of computing 
the statistic x? are presented and it is important to understand 
the circumstances under which each may be used. The principal 
skill developed is the ability to use a table of the x? distribution. 


REFERENCES 


L 
2. 


о N o g 


Bliss, С. I., “А Chart of the Chi-square Distribution,” Journal of the American 
Statistical Association, 39 (June, 1944), 246-248. 

Cochran, W. G., “The Chi-square Distribution for the Binomial and Poisson 
Series with Small Expectations,” Annals of Eugenics, 7 (1936), 207-217. 


. Cochran, W. G., “The Chi-square Correction for Continuity," Iowa State 


College Journal of Science, 16 (1942), 421-436. 


. Cochran, W. G., “The Comparison of Percentages in Matched Samples," 


Biometrika, 37 (December, 1950), 256-266. 


. Eisenhart, C., Hastay, M. W., and Wallis, W. A., Selected Techniques of Sta- 


tistical Analysis, New York, 1947, McGraw-Hill Book Co. 


. Fisher, В. А. and Yates, F., Statistical Tables for Biological, Agricultural and 


Medical Research, New York, 1948, Hafner Publishing Company, 3d ed. 
Neyman, J. and Pearson, E. S., “Further Notes on the x? Distribution," 
Biometrika, 22 (1931), 298-305. 


. Pearson, K., “On a Criterion that a Given System of Deviations from the 


Probable in the Case of a Correlated System of Variables is such that it Can- 
not be Reasonably Supposed to Have Arisen from Random Sampling,” Phil- 
osophical Magazine, 5th series, 50 (1900), 339-357. 


. Rope, Frederick T., Opinion Conflict and School Support, New York, 1941, 


Teachers College, Columbia University, Bureau of Publications. 


. Symonds, P. M., Adolescent Fantasy; an Investigation of the Picture-story 


Method of Personality Study, New York, 1949, Columbia University Press. 


. Walker, H. M., “Degrees of Freedom," Journal of Educational Psychology, 


31 (1940), 253-269. 


. Walker, H. M., Mathematics Essential for Elementary Statistics, New York, 


1951, Henry Holt and Company, Chapter 22, Degrees of Freedom. Equations 
with Several Variables. 


. Yates, F., “Contingency Tables Involving Small Numbers and the x? Test," 


Supplement to the Journal of the Royal Statistical Society, 1 (1934), 217-235. 


| 5 Populations and Samples on a 


Continuous Variable 


The methods considered in the first four chapters are 
suitable for obtaining inferences from data classified in discrete 
classes. In dealing with data arising from measurement of a 
continuous variable these methods are insufficient, although the 
logic used is in its general outlines the same as that already de- 
scribed. This chapter will present some general concepts regarding 
distributions on a continuous variable. It will present symbolism, 
formulas for obtaining the mean and standard deviation of a 
sample, and methods for testing the hypothesis that a popula- 
tion distribution is normal. A student who has had an introduc- 
tory course should be able to read pages 111 to 117 very rapidly; 
others will need practice with symbolism and computation. 

Mathematical Model of the Population. The first populations 
considered in this book were dichotomies. For these the mathe- 
matical model consisted of a definition of the two discrete classes 
and a statement that the proportion of cases in one class was P 
and in the other 0. Later P and Q were called the probabilities 
of these classes and the population distribution thus specified was 
recognized as a probability distribution. Subsequently we con- 
sidered populations having more than two discrete classes, the 
mathematical model for such being a definition of the classes and 
a statement of the probability of each class. For populations on 
a continuous variable, probabilities are defined by areas under a 
curve. There are many forms of probability curves of which 
the most widely known is the normal curve. Practice in computing 
probabilities as segments of area between two ordinates of the 
normal curve has already been given in Chapter 2. 

The Mean and Variance of a Population. A population may 
be considered to have a variety of characteristics, among which 
are the mean and the standard deviation. If X represents the 
variable, these have already been defined as 


p=E(X) and o=VE(X - и)? 


110 - Populations and Samples on a Continuous Variable 


If the probability distribution of the population is known exactly, 
и and с can be calculated by means of the integral calculus, the 
procedure being analogous to that used in Formulas (2.4) and 
(2.8) on pages 28 and 29 for discrete distributions. If the proba- 
bility distribution is not known exactly but some information 
about its nature is available, the mean and variance or standard 
deviation can be estimated from a sample. Thus for probability 
distributions known to have the binomial form, the estimation 
of the mean was discussed in Chapter 3. Methods of estimating 
characteristics of continuous populations will be discussed in later 
sections. 

Normal Populations. The normal distribution was described 
in Chapter 2 as an approximation to the binomial distribution, 
and was used in Chapter 3 to compute probabilities of samples. 
An important use of the normal distribution is to provide a mathe- 
matical description for populations. A population so described is 
called a normal population. Most of the methods discussed in the 
following chapters, with the exception of those in Chapter 18, 
are based on the assumption that observations are drawn from a 
normal population. Some aspects of the normal distribution not 
previously considered will now be described.* 

The normal probability curve is a smooth distribution with 
unlimited range in each direction and with probability density 
given by the equation 


(6.1) Loc 
р =——@ 2% 
i oV2r 


y is the ordinate of the curve, called the probability density. 

X is the variable for which (5.1) furnishes the probability distri- 
bution, and is distributed along the horizontal axis in all the 
graphs of the normal curve presented in this text. 

T is the familiar number with approximate value 3.1416. 

e is a number with approximate value 2.718. 

и = E(X) is the expected value of X. Being the mean of the 
normal distribution, it determines the location of the distribu- 
tion along the scale of X. It may be positive, negative or 
zero. The mean, median and mode of the normal curve are 
identical in value. 

* Readers who are not mathematicians and who desire further discussion of the 


equation of the normal curve than is given here may consult pages 245-248 of Walker, 
Mathematics Essential for Elementary Statistics. 


Use of Summation Sign · 111 


c? = E(X — u)? and may have any positive value. The standard 
deviation ø of X is the unit of measure for the horizontal axis. 
The area under the curve of Formula (5.1) is unity. 

The ordinate y depends not only on the particular value of 
X which defines the point where the ordinate is erected but also 
depends on the values of и and c. When и and c are given par- 
ticular values, an ordinate y can be calculated for each X and a 
curve represented by these ordinates can be drawn. For another 
pair of values of и and с another curve is obtained. Thus Formula 
(5.1) represents a family of curves. Variables like и and c, specific 
values of which determine a particular member of a family of 
curves, are called parameters. 

The symbols e and т represent numbers which always have 
the same value. The symbols и and с represent numbers which 
are constant for any one curve but vary from one curve to another. 
The symbols y and X represent variables which fluctuate even 
for one particular curve. 

In Figure 5-1 are three normal eurves corresponding to three 
pairs of values of и and о, drawn on the same scale of values of X. 


и=15 u-20 р=28 

g*"5 g*-25 0=5 

Fia. 5-1. Three normal curves with same area but 
different values of д and c. 

Strictly speaking a sample cannot have a normal distribution. 
As soon as actual values are observed, those values have some 
finite range instead of the unlimited range of the normal curve; 
the values if plotted take the form of a histogram instead of a 
smooth eurve; and sampling irregularities distort the form of the 
curve slightly. Therefore the normal curve is best considered 
as a probability distribution describing either a population dis- 
tribution or the sampling distribution of a statistic but not describ- 

ing an observed distribution of a sample. 
Use of Summation Sign. One of the most widely used symbols 
in statistical literature is the large Greek letter sigma, Z, corre- 
sponding to a capital S. It is called the summation sign. Before 


112 - Populations and Samples on a Continuous Variable 


giving the formulas for the mean and the standard deviation of a 
sample, the use of the summation sign may well be reviewed. 

If X; represents the score of the first individual, X; the score 
of the ith, and N the entire number of individuals in a sample, 


N 
Xit Xa+ t Х +++ Ху= Ух; 


i=l 


N 

Y X; may be read “the sum of the scores of individuals 1 to N,” or 
i=1 
“the sum of values X; where 7 ranges from 1 to №” or “summation 
X extending from X, to Ху.” 

The purpose of the variable subscript and the limits of sum- 
mation is clarity. They may be omitted if they are not needed 
to prevent confusion. In elementary work summation is almost 
always over the N individuals in a sample and therefore because 
the limits of summation are assumed to be from 1 to N they are 
usually not written. In later sections of this book beginning with 
Chapter 9 observations may be classified on several traits simul- 
taneously and formulas may be ambiguous unless the limits of 
summation are indicated. 

The formulas for the mean and standard deviation of a sample 
when the observations are classified in step intervals may be clari- 
fied by means of subscripts. For example, suppose 20 individuals 
have the following scores: 


2, 4, 9, 6, 3, 5, 6, 6, 3, 3, 7, 5, 7, 4, 3, 8, 8, 6, 5, 4 
The sum of these scores may be indicated as 


20 
УХ; = 104 
or as 
8 
DXi = 1(9) + 2(8) + 2(7) + 4(6) + 3(5) +3(4) + 4(3) +102) = 104 


When the mean of these 20 scores is computed in the customary 
manner for grouped data, each class index X; is multiplied by the 
frequency in the class f; and the results are summed, 2f;X; But 
here the variable subscript j does not run from 1 to N but takes 
only as many values as there are class intervals, which in this 
example is 8. Therefore we write either 


20 8 
УХ: -=104 or YfX,-104 
$21 j=1 


Use of Summation Sign · 113 


Note carefully the effect of parentheses in the following: 


5 
Y (X: + a) = (Xs + a) + (X4 + a) + (Xs +a) 
$28 

= X+ X,+ X; + За 


5 
Ух: +а= Х.+ Ха + Хь+а 


i=3 


У (х, + У) = (Xs+ Ys) + (X+ Ys) + (Xs+ У) 


5 5 
= (%+Х,+ ХЭ + (Vat Vat Ys) = ХХ: + XY; 


EXERCISE 5.1 
1. Study the summations represented by the following symbols and 
their equivalents: 


6 

Ух; = Xs+ Mit Хь+ Xe 

a 

у ax; = ато + ати + аты = a(t + zu + 232) 

i=10 
10 
У (X: — 5) = (Xs — 5) + QG — 5) + (Xs — 5) + (Xo — 5) + (Xu — 5) 
i=6 

= Xs Xi Xs+ Xo+ Xi — 25 


8 
У зу? = Bye? + Зу + Bye? = З(и? + y? + y) 
i=6 


8 
У 2(Z; — b) = 2(Zs — b) + 2(% — b) + 2(Z; — b) + 2(Zs — b) 
iab 
= 2(Zs+ Zot Zr + Za) — 8b 
9 
№ iYi = тут + Lays + Loyo 
t=7 


4 
ХХ, f hX f 

2. Write an equivalent expression for each of the following, making 
use of the summation sign: 

а. Xe +Xr+ Xs Xo 

b. 4x; + 415 + 4c; + 4% 

с. Завув + Зітут + Says 

d. (Ys — А) + (Ys — A) + (Ys — A) + (Ys - A) 

e. ЛХ + faXa + ВХ + АХ. + fsXs +ЛХ 

3. Assume the following individual scores: 

Individual 115352117309 74. 15: 16 T 8. 9. а 

Зоте X ВИЗЕ 61.0. 61415114 8 


114 - Populations and Samples on a Continuous Variable 


Find the numerical value of the following expressions: 


E YU е. ie. zi (2х) 
4 


12 12 
d. Y(X,- Xy h. &( x) 


4. Assume that each of N — 10 individuals has been measured on 
variable X and also on variable Y with results as follows: 


Individual 1 2 3 4 5 6 7 8 9 10 

ОИ ОВ л pU Г 10 7 uae 4 

Бсоге оп У 2 D лӨ ЕТЕ А DEO. ^g 1 

Find the numerical value of the following expressions: 

are €. 3 xy, Exe Due 
А Мск terry o EE N 


> 


b х 1 N N 
Ве Ма е от 
i=1 i=1 


N N N 
с. Уха + Y(x;-Xy i Y(y,-Yy 
i=1 i=1 =1 

Definition of Mean, Variance, and Standard Deviation for a 
Sample. In any sample there is likely to be duplication of scores. 
Even when no two observations are precisely equal, it is usually 
convenient to group observations into intervals at least some of 
which have frequencies larger than 1. Both of these situations 
will be referred to as situations in which data are grouped. When 
working at a computing machine it is often convenient to take 
Scores one at a time as they come, without grouping them. Such 
situations will be referred to as computation without grouping. 

Definitional formulas for the mean and variance will now be 
given for both grouped and ungrouped data. Computational 
formulas will be given in the following section. The standard 
deviation is defined as the square root of the variance and does 
not require a separate formula. 


(5.2) Mean, without grouping X == 


Definition of Mean and Standard Deviation - 115 


k 
SAX; 
(5.3) Mean, with grouping X ==4+—— = == 


Ул 
і=1 


In Formula (5.3) each of the k different values of X; may be 
thought of as defining a class. Then f;/N is the proportional 
frequency in that class and may be denoted p; for the sample or 
P; for the population. Then 


k 

(5.4) X= Ух, 

у 
(5.5) Let = Х,-Х or 4; = X;-X 
Then the variance without grouping is 

X;—- Xy 
(5.6) aedi аш rag 
} N-1 N-1 


2. t 
(5.7) ga SL. эке 


Some readers will be accustomed to using N rather than N — 1 
in the denominator of variance and standard deviation. The 
reason for using N — 1 is that it leads to a more satisfactory 
estimate of the population variance as will be explained in later 
sections of this chapter. N — 1 is the number of degrees of freedom 
associated with the variance. 

Formulas for the Computation of the Mean, Variance, and 
Standard Deviation for a Sample. In computation it would 
usually be an uneconomical procedure to subtract the mean from 
each individual score. Instead deviations are taken from some 
arbitrarily chosen value as origin and the computation is com- 
pleted by a formula which is algebraically equivalent to the 
definition formula. 

Taking the origin at zero is particularly convenient when com- 
putation is made by machine. Deviations from zero are called 
“gross scores” or “raw scores.” The variance of ungrouped 
scores with origin at zero may be computed from Formula (5.8) 


116 - Populations and Samples on a Continuous Variable 


$axr- (Xx) zX:- (ZX)/N 
2 і i=1 S = 
0:8) XE Nea TUS WES 


or by the equivalent Formula (5.82) which has the advantage that 
no rounding errors occur until the final division by N(N — 1). 


N N 2 
pes А 
NS (х) МУХ? — (2X)? 
N(N — 1) =  N(N-1) 
For grouped scores with origin at zero the equivalent formulas are: 


k k $ 
p 29) ad m 
N-1 E N-1 


(5.8a) g= 


(5.9) g= 


k k 2 
а eee (22%) vue em 
; i N(N – 1) N(N — 1) 


Taking deviations from an arbitrary origin near the middle 
of the distribution reduces the size of the numbers involved in 
the computations, and is therefore an advantage when the work 
is to be done with pencil and paper. Grouping scores into in- 
tervals and coding the intervals further reduces the size of the 
numbers involved. The width of the interval is often called i 
and will be so denoted here. It will require a little attention 
on the part of the student to distinguish the variable subscript 
$ from the letter $ which indicates the width of the step interval. 
As has been noted before, there are not enough letters in the al- 
phabet to provide a unique letter for every use and it is often 
necessary to use the same letter in more than one meaning. If 
any confusion arises it will quickly subside. 


(5.10) Let т; = X; - А 
where 7 is the width of the interval and A is an arbitrary origin. 


When deviations are taken from an arbitrary origin A the mean 
is computed by formula 


k 


and the variance by 


Exercise - 117 


k | »- (3 у К 

те?! аи) узду - (угу 
N(N - 1) N(N - 1) 

In computing the standard deviation it is obviously simpler to 
multiply by $ after taking the square root unless the variance 
also is needed. 

The computation of the mean and variance from grouped data 
is shown in Table 5.1. 


(5.19) 8° = 0 


TABLE 5.1 Computation of Mean and Standard Deviation from Grouped 


Data. 
Interval X; fi ху ДА’ Ле! 
1 2 3 4 5 6 
78-82 80 3 4 12 48 
73-77 75 6 3 18 54 
68-72 70 7 2 14 28 
63-67 65 2 1 12 12 
58-62 60 15 0 0 0 
53-57 55 13 =i — 13 13 
48-52 50 9 -2 —18 36 
43-47 45 7 -3 —21 63 
38-42 40 4 —4 — 16 64 
33-37 35 2 zy: —10 50 
28-32 30 idi -6 TT .36 
Toran 79 56 404 
_ 84 
— 28 
X = 00 + 2020 = 60 — 177 = 5828 
5 
= m 28 _ NES = 550823 = 11.25 
i 79(78) ON Gie2 


= 25(5.0523) = 126.3 


EXERCISE 5.2 
1. Compute the mean and variance of the following scores, taking an 
arbitrary origin at A = 40. Scores: 42, 37, 49, 45, 42 


Solution 
x X-—40 (X — 40)? 
42 2 4 X =40+48 = 43 
" E ] 5(123) — (15)? _ 390 
49 9 81 Но, 
45 5 25 fes 5(4) 0797 
42 2 4 


Sum 15 123 


118 - Populations and Samples on a Continuous Variable 


2. Compute the mean and standard deviation of the following scores 
without grouping, taking an arbitrary origin at A = 80. 
Scores: 74, 79, 87, 91, 76, 79, 85, 81, 80. 

3. Compute the mean and variance for each of these sets of numbers 

a. 15, 10, 19, 6,5 

Бо, =] бит 

c. 0,3,2,0,5,7,1 

4. For the data of the worked example on page 117 compute the mean 
and standard deviation making use of a different arbitrary origin. 


Sampling Distributions. The sampling distributions of the 
mean X, of the variance $°, of the standard deviation s, and of 
other statistics will be needed in this and later chapters for the 
purpose of making inferences about populations. Because a large 
number of new statistics will be used, certain concepts regarding 
sampling distributions already introduced in Chapter 3 will be 
described in the following sections. 

The population from which a sample is drawn is often called 
the parent population. Theoretically if the parent population has 
been described mathematically by some hypothesis the exact 
sampling distribution of any statistic can be derived by mathe- 
matical methods. Actually the derivation of sampling distribu- 
tions often presents considerable mathematical difficulty, so that 
the exact sampling distribution of a particular statistic may not 
be available for practical use even though the population dis- 
tribution can be mathematically described. 

When the exact sampling distribution is not known, or is too 
complex for practical use, an approximate distribution may be 
available. Examples of such approximation have already been 
met when the normal curve was used as an approximation for the 
binomial, and the smooth x? curve was used to approximate the 
discontinuous exact x? distribution. Such approximations will 
also be found useful for large samples from populations on a con- 
tinuous variate. 

The standard deviation of the sampling distribution of a sta- 
tistic is commonly called the standard error of the statistic. Thus 
c; is the standard error of a proportion and oz the standard error 
of a mean. In Chapter 2 the mean and standard error of a pro- 
portion were given as 


м, = E(p) = P and в = VEG -PF - (29 


Test of Normality - 119 


In Chapter 6 the mean and standard error of a mean will be given 
as 


их = E(X) = р and og = VEX- t = 7; 


These cases illustrate the important generalization that the mean 
and standard error of a statistic can be calculated directly from char- 
acteristics of the population and may be known even though the exact 
sampling distribution of the statistic is not known. Early work in 
statistical theory placed great emphasis on the discovery of means 
and standard errors of statistics, and some textbooks still state 
formulas for standard errors without giving any information about 
the sampling distribution. Usually in such cases it is tacitly 
assumed that the sampling distribution is normal. Any text 
which gives the formula for о; or ø» without any statement about 
the non-normality of the sampling distributions involved is living 
in the past. One distinctive aspect of modern statistical theory 
is its persistent search for the exact sampling distributions of 
statistics and satisfactory approximations to these distributions. 

Unbiased Estimate of a Parameter. Suppose that it is de- 
sired to estimate some population parameter by means of a sample 
statistic. If the parameter is the expectation of the statistic, then 
the statistic is an unbiased estimate of the parameter. Thus p 
is an unbiased estimate of P and X an unbiased estimate of д. 

It is now possible to explain why N — 1 is used instead of N 
in the definition of the sample variance, s*. It can be demonstrated 


ET» eL 1 
mathematically that II is an unbiased estimate of 0°, or 


in symbols 
(5.13) (E? = в? 


(514 white (2020) Ия 


Therefore Formula (5.14) gives a biased estimate of о?. 

Test of Normality Applied to a Distribution of Sample Vari- 
ances. The data for this computation were obtained by students 
in a class in statistics studying sampling. Each student made 
16 throws of ten pennies thus obtaining samples for which N = 16, 
and computed the variance of each sample. The distribution of 
the variances from 256 such samples is shown graphically in Fig- 


120 - Populations and Samples on a Continuous Variable 


ure 5-2, with a superimposed normal curve having the same mean _ 
and standard deviation. The histogram appears a little more 
peaked than the normal curve. A test of normality employing x? 
will now be applied to these data. 


Fig. 5-2. Frequency distribution of 256 sample variances, 
with normal curve superimposed. 


TABLE 5.2 Distribution of 256 Sample Variances 


Variance f Variance 7. Variance ip 
5.3-5.4 1 3.7-3.8 3 2.1-2.2 33 
5.1-5.2 1 3.5-3.6 8 1.9-2.0 22 
4.9-5.0 1 3.3-3.4 7 1.7-1.8 19 
4.7-4.8 а 3.1-3.2 15 1.5-1.6 19 
4.5-4.6 2 2.9-3.0 16 13-14 18 
4.3-4.4 2 2.7-2.8 21 1.1-1.2 7 
4.1-4.2 3 2.5-2.6 23 .9-1.0 3 
3.9-4.0 5 2.3-2.4 25 ane dsl 2 


1. The first step is to obtain the values of the 10th, 20th, 
30th . . . 90th percentiles of a normal curve having 


и=Х = 2.388 and с=з = .816. 


The values of Е corresponding to these points аге read from 


a normal probability table and recorded in column 2 of Table 5.3. 
Each entry in that column is multiplied by с = .816 to obtain 
the entries in column 3. To these are then added и = 2.388 to 
obtain the entries in column 4, which are the abscissa values of 
the percentiles 8° 10, 8°29, ete., on the baseline in Figure 5-2. 


Test of Normality - 121 


TABLE 5.3 Computation of x? Test for Normality of Distribution of 
256 Sample Variances 


Frequency 
A X-u Хор Frequency in 
point c р below Х interval 
fi te 
1 2 3 4 5 6 7 
0 1282 1046 3434 228421840) —:39204 266 70750 
80 82 687 3075 208-200 (15) = 2000 195 38025 
70 5i — 428 2816 1+0 (QI) = 1883 21.6 466% 
60 253 206 2.504 148 + е (23) = 1646 23.7 56169 
50 0 0 2388 123+ 9955) = 127.8 — 968 195424 
40 — 253 — 206 2182 +380 (93) = 1118 — 160 25600 
30 — —.504  — 428 1900 68+ (22) = 801 307 100489 
2) — — 842 — 687 1701 49:051010) = 588 263 601% 
10 1282 —1046 1842 12450708) = 203 385 112225 
203 41200 
2560 8905729 
xt = Me _ у = OM — 256 = 158 =7 
xis = 14.1 X*.075= 16.0 


2. The second step is to compute the frequency in the histo- 
gram between each two of the successive points listed in Column 4. 
For this, the frequency is cumulated beginning at the lower end 
of the scale of scores, as in the computation of percentiles and 
of percentile ranks. The frequency below each point is then 
obtained as in column 5 of Table 5.3. 

It may be helpful to explain in detail the computation of one of 
the entries in Column 5, say the 
first one. The frequency below 
the point X = 3.484 is desired. 
From Table 5.2 it is evident 325 3.43 8.45 
that the point X = 3.434 lies in the interval marked 3.3-3.4, that 
there are 7 cases in this interval and 223 cases below it. There- 


122 - Populations and Samples on a Continuous Variable 


fore below the point X — 3.434 there will be all of the 223 cases 
plus part of the 7 cases. What part of the 7 cases? "The interval 
has as its real limits the values 3.25 and 3.45, so X = 3.434 stands 
nearly at the top of the interval. The proportion of the interval 
below this point is 

3.434 — 3.25  .184 

3.45 — 3.25 ~ .200 


and therefore the number of cases in the interval below X = 3.434 
is 2o (7) = 6.44 and the entire number of cases below X = 3.434 
is 223 + 6.4 = 229.4. 

The first entry in Column 6 is 256 — 229.4 = 26.6 or the num- 
ber of cases above X = 3.434. The second is 229.4 — 209.9 = 19.5, 
or the number of cases between X = 3.434 and X = 3.075. All 
subsequent entries in this column are found in the same way. 
The last entry is 20.3 — 0 or the number of cases below X = 1.342. 
The entries in Column 6 are treated as observed frequencies in 
the x? formula. 

3. Since each F; = N/10 = 25.6, the remainder of the computa- 
tion is very simple. 

d Sf NES A _ 6957.22 
x’ = 25 NM — 256 = 25.6 

There are 10 observations (10 values of f;) and 3 restrictions upon 

them, namely 

(1) 2f: = ZF; The number of cases in the 
fitted normal curve is assumed 
to be 256, which was the ob- 
served N. 

(2) 2X; = TEX; This condition was imposed 
when the theoretical distribu- 
tion was given the same mean 
as the observed distribution. 
The X’s are arbitrary, the re- 
striction being upon the fre- 
quencies. 

(3) Zf(X;- X)? = РКХ, =X)? This condition was imposed 
when the theoretical distribu- 
tion was given the same 
variance as the observed dis- 
tribution, 


— 256 = 15.8 


Review Exercise · 123 


The number of degrees of freedom is therefore n = 10-3 = 7. 
For 7 degrees of freedom, x”. = 14.1 and х? зв = 16.6. This test 
justifies discarding the hypothesis of normality. 

In this problem, и and ø could have been obtained from the 
mathematical characteristics of the sampling distribution of 8°. 
However, in most similar situations where a x? test is employed 
there are no a priori values for u and сапа these must be estimated 
from X and s. The latter method has been used here to illustrate 
what is the usual procedure. 

REVIEW EXERCISE 


1. Some of the following terms have been used in earlier chapters, and 
all of them are important for understanding this chapter. 


degrees of freedom parent population 

family of curves sampling distribution of a statistic 

limits of summation standard error 

normal population statistic 

normal probability curve unbiased estimate 

parameter variable subscript 

2. Review the meaning of these symbols, and decide which of them 
represent 


(a) a number which always has the same value 

(b) a number which has the same value for any given population but 
which may have different values for different populations 

(c) a number which changes its value from sample to sample 

(d) a symbol which indicates an operation to be performed 


x, ш Aly фе Х- о, and ux. 


6 Sampling Distributions 


Purposes of the Chapter.* The principal goal of this 
chapter is to give the student a sense of familiarity with the 
different sampling distributions which are most often called for 
in research studies, to acquaint him with the general shape of 
these distributions and with the more common situations to which 
they apply. To this end empirical distributions will be obtained 
by drawing many small samples, computing statistics from these 
samples, and plotting the frequency distributions of the various 
statistics. On the empirical distribution of each statistic there 
will be superimposed a mathematical curve obtained from the 
formula for the sampling distribution of that statistic. 

The chapter has been designed to accomplish certain second- 
ary goals also. It provides experience in the use of a table of 
random numbers. It provides a review of the principal computa- 
tions of elementary statistics, that review occurring incidentally 
in the course of obtaining the empirical data needed rather than 
occurring for its own sake. It provides an introduction to the 
material treated more fully in Chapters 7 to 9. 

Here too the student may observe an instance of that amazing 
parallelism so often noted between certain aspects of the world 
of phenomena and the world of abstract mathematical theory. 
One performs certain physical operations, draws a large number 
of samples, measures the individuals in those samples, makes cer- 
tain computations upon those measurements, draws a graph of 
the resulting statistics and lo, the form of the graph is similar 
to that of a curve determined by purely mathematical reasoning 
and having no implicit connection with the concrete data. Further- 
more, the shape of the graph obtained through this extensive 
manipulation of concrete measures is independent of the nature 
of the trait measured. That shape is the same whether measure- 
ments are made of human beings, manufactured products or shell 

* This chapter will be most helpful if you read it rapidly the first time without 
pressing too hard for complete understanding, and if you return to it often as you study 
the next three chapters, It would be wise to reread it completely after you have studied 
Chapter 9. 


The Data · 125 


fish, whether the measures belong in the domain of physics, 
psychology, business, or public opinion. The shape of the graph 
depends only on certain characteristics of the population distri- 
bution, the size of the samples drawn, and the choice of the 
statistic computed. 

The student who has completed the calculation of a statistic 
usually wishes to use that statistic as basis for assertions about an 
unknown value of a parameter in the population. Of necessity 
he must check the statistic against a table derived from the mathe- 
matically calculated sampling distribution of the statistic. The 
observed parallelism between the empirical and the mathematical 
distributions should convince the student that the mathematical 
tables are a reflection of experiential reality. Then if the mathe- 
matical distribution indicates that the research worker should 
accept a hypothesis because his sample may have originated from 
the hypothetical population by chance, repeated sampling, which 
usually he would be unable to carry out, would have given him 
the same information. 

The student who goes carefully through the material of this 
chapter should never afterward fall prey to the popular fallacy 
that all sampling distributions are normal and he should know 
that the beneficent parallelism referred to in the preceding рага- 
graph is not discovered automatically but must be striven for by 
learning which probability distribution belongs to any particular 
statistic. 

The Data. In Table XXIV of the Appendix is a set of pre- 
registration scores on the Cooperative Test Service English Test 
made by 447 college students. The distribution of scores has been 
tested for normality as described in Chapter 5 and the departure 
from normality has been found negligible. 

Because of the close approximation of its distribution to the 
normal, the set of 447 scores will play the part of a normal popula- 
tion in the remainder of this chapter. Random samples drawn 
from this set of scores will be considered as random samples from 
a normal population and will illustrate sampling from a true 
normal population. 

The mean, variance and standard deviation of the scores are 


и = 121.6; 0° = 1380.4; с = 37.15 


In Table XXIV the entries in the first column are serial num- 
bers identifying the subjects. Each serial number has 3 digits 


126 · Sampling Distributions 


if zeros are inserted for missing digits in the tens or hundreds 
place, so that 9 is written 009 and 21 is written 021. 

These scores in Table XXIV are now to be considered as de- 
fining a population from which random samples are to be drawn 
by means of a table of random numbers. If the same number 
should be drawn more than once, the individual represented by 
that serial number should be included as often as his number is 
drawn. This procedure is called sampling with replacement and 
has the effect of transforming the finite population of 447 cases 
into an infinite population with the same percentage distribution. 

Use of Table of Random Sampling Numbers. Several lists of 
random numbers are available.* Appendix Table XXIII is a 
part of the Table of 105,000 Random Decimal Digits published 
by the Interstate Commerce Commission. 

Suppose, for example, that it is desired to choose 6 cases at 
random out of a set of 50 cases. As 50 has 2 digits, we shall read 
2-digit numbers from the table disregarding all those larger than 
50. The first 6 numbers read will identify the 6 individuals se- 
lected. We should decide, preferably in a random manner, where 
to enter the table before we look at it, lest after we see the page 
we may be influenced by a possible desire to include or to exclude 
a particular individual. 

If we should decide to begin with block 1 and line 1, using the 
first 2 digits, and to move downward across the table, the 6 in- 
dividuals chosen would be those having code numbers 10, 22, 24, 
42, 37, and 28. Note that 4 numbers have been passed over be- 
cause they are larger than 50. If we should decide to begin in 
block 9 and row 50, using the first 2 digits and moving upward 


* Tippet, L. H. C., Random Sampling Numbers with foreword by Karl Pearson, 
Cambridge University Press, 1927. 
Fisher, R. A. and Yates, F., Table XXXIII in Statistical Tables for Biological, Agricul- 
tural and Medical Research, Oliver and Boyd, London, 1938. 
Interstate Commerce Commission, Bureau of Transport, Economies and Statistics, 
Table of 105,000 Random Decimal Digits, Statement No. 4914, File No. 261-A-1, 
Washington, D.C., May 1949, 29 pp. 
Kendall, M. G. and Smith, Babington B., Tables of Random Sampling Numbers, Tracts 
for Computers, No. 24, Cambridge University Press, 
Snedecor, George W., Statistical Methods, Ames, Iowa, Iowa State College Press, 1946, 
pp. 10-13. 
The general criteria that a set of such numbers must meet in order to be useful in 
sampling have been stated by Kendall and Smith in “Randomness and Random Sam- 
pling Numbers,” Journal of the Royal Statistical Society, 101 (1938), pp. 147-166. 


A Class Project in Drawing Samples - 127 


across the table, the individuals chosen would be those with code 
numbers 27, 2, 39, 32, 4, and 44. 

It is permissible to begin at any point in the table, and to move 
in any direction which is convenient. The variety of ways of 
selecting digits from this table is enormous. Obviously a simple 
method by which numbers can be easily identified is preferable to 
a complicated method. 

A Class Project in Drawing Samples, Computing Certain 
Statistics and Obtaining the Random Sampling Distributions of 
those Statistics. If this project is undertaken by any group of 
persons it is essential that (a) all samples be drawn at random, 
(b) all samples be of the same size, (c) any two samples be inde- 
pendent. Unless these conditions are observed, the statistics can- 
not be expected to conform to the pattern of the theoretically 
established sampling distribution. Condition (c) will require 
some planning if a table of random sampling numbers is employed, 
to provide that no two persons use the table in quite the same 
way. The steps listed below have been carried out and results 
recorded in Specimen Worksheets. The student will find it con- 
venient to prepare similar forms in advance. 

In the directions set down here the sample size has been kept 
small for several reasons. (1) As sample size increases, the distri- 
butions of many statistics become more like the normal curve. The 
difference in form of the various curves is more dramatically por- 
trayed when N is small. (2) Drawing a number of small samples 
and combining them into one larger sample will help to make 
vivid the change which takes place in the sampling distribution. 
of a statistic as N increases. (3) For a given expenditure of time 
the student will learn more by computing a variety of statistics 
for each of a few small samples than by computing fewer statistics 
for larger samples. 

The steps in computing the required statistics are as follows: 

1. From a table of random numbers draw 5 numbers of 447 
or smaller and enter them in the first row of Worksheet I in the 
column for Sample I. Similarly draw and record 5 serial numbers 
for each of the other three samples. 

2. Treating each number obtained in Step 1 as a serial number, 
read from Table XXIV the score of the subject that has that 
serial number and record it in row 2 of Worksheet I. 

3. Carry out the computations indicated in rows 3 through 
11 of Worksheet I and check in the manner indicated at the 


128 - Sampling Distributions 


SPECIMEN WORKSHEET | 


Sample of xe 
ы ample of 5 posite 
Operation Raoni 
1 2 3 4 of 20 
1 я 018, 074, | 191,296, | 088, 015, | 092, 414, 
К ne T serial numbers | зов 398 | 117, 420, | 284, 071, | 171, 102, 
REA 108 395 122 392 
127, 165, | 170, 157, | 169, 137, | 180, 78, 
2. Corresponding scores, X;;| 47, 103, | 105, 51, | 123, 104, | 152, 142, 
91 40 98 58 
N 
3. Sum of scores » X 533 523 631 610 2297 
H 
1 N 
4. Mean т» due Y 106.6 |1046 |1262  |122 114.85 
т 
5. Sum of squares of scores 
у » 64458 |68775 |82879 |85116 | 301223 
1 
6. Correction term 
N 2 p 
(Y x) N 56817.8 | 54705.8 |79632.2 |74420 |203810.45 
№ 
N N 2 
7. 2a? = Ух (Ух) /N| 7635.2 |14069.2 |3246.8 |10696 | 37412.55 
1 1 
N 
8. $ = DEZ =) 1908.8 |3517.3 |8117 2674 1969.08 
È 
9. ф= Х;-Х... — 8.25 |— 10.25 [11.35 |715 
10. Nd? 340.3125 | 525.3125 | 644.1125 | 255.6125 
ll. s= Уй 43.7 59.3 28.5 51.7 444 
Checks: Row 3 533 + 523 + 631 + 610 = 2297 
Row4  4(106.6 + 104.6 + 126.2 + 122) = 114.85 
Row 5 64453 + 08775 + 82879 + 85116 = 301223 
Row 7 7635.2 + 14069.2 + 3246.8 + 10696 = 35647.2 
Row 10 340.3125 + 525.3125 + 644.1125 + 255.6125 = 1765.35 
Row 7, Entry in last column = 37412.55 


bottom of the Worksheet. The mean (X), the variance (s?), and 
the standard deviation (s) have now been obtained for each small 
sample of 5 and for the composite sample of 20. 

4. Enter the computed statistics on Worksheet II and com- 
plete the additional computations suggested there in rows 1 to 10. 


SPECIMEN WORKSHEET Il 
Record of Statistics Computed from Samples 


Sample 
Sample of 5 
Statistic eats of 20 
1 2 Sm 
D 106.6 |1046 | 126.2] 122 
2.X 114.85 
3. XY — p= X – 121.6 — 15.0 | — 17.0 |46 |4 
4. X — p = X — 121.6 — 6.75 
5. X-u) lr = (X — 121.6)/16.61 | — .903| — 1.023 | .277 | .024 
N 
6. (X и) ү == = (X — 121.6) V5/s | — .768 | — .641 | 362 |.017 
N 
7. (X — и) /— = (X — 121.6)/8.31 -. 
a-p / = )/83 812 
8. (=p /— = = (X —121:8)v20 - 
(т-д) / = ®-—191%) 20/8 .687 
9. Xat/g? = Da?/1380.4 553 |1019 |235 |715 
10. Za2/o? = 222/1380.4 27.10 


5. On Worksheet ИТ, enter the following statistics: 
Line 1 The difference of two means taken from line 4. 
Thus X, — X; = 106.6 — 104.6 = 2.0 


Line 2 The sum of 2 entries on line 7 of I 


Line З Computation based on entry in preceding line 
Line 4 Entry in line 1 divided by entry in line 3 
Line 5 Ratio of two variances taken from line 8 of I. 

Thus 1908.8/3517.3 = .54 
SPECIMEN WORKSHEET III 


Record of Statistics for Comparison of Samples 


Statistic Samples compared 
1,2 2,3 3,4 
1. X; - Xj 2.0 — 21.6 | 4.2 
2. Za; + Ez; 217044 | 17316.0 | 13942.8 
з. ах, = \/ Мег |gag |4 |264 
4, (X; — X)/sy.-x, 06 —.73 |16 
54 4.33 30 


5. 82/8? 


130 · Sampling Distributions 


We shall now consider some general issues before examining the 
empirical distributions which have been produced by classes carry- 
ing out the sampling project which has been described above. 

Samples from a Normal Population. A large part of this book 
is concerned with the theory of samples drawn from a normal 
population. The sampling distributions which will be used are, 
therefore, based on this assumption. However it should be recog- 
nized that empirical studies of samples from non-normal popula- 
tions indicate that a considerable departure from normality does 
not invalidate the methods described. 

We shall, in this chapter, describe four of the most useful 
smooth sampling distributions of statistics calculated for samples 
from a normal population. Before proceeding with the descrip- 
tion of these distributions it is desirable to clarify the concepts of 
independence among statistics and of degrees of freedom. 

Independence of Statistics. An unexpected property of sta- 
tisties calculated for samples from a normal population is that 
certain pairs of statistics based on the same observations are in- 
dependent of each other. This property does not hold for all 
statistics but it does hold for the mean and the variance and also 
for the mean and the standard deviation. It does not hold even 
for these statistics if the population is not normal. 

When the mean and the standard deviation are both calcu- 
lated for each of many samples, they vary independently of each 
other. This principle is proved mathematically in treatises on 
the mathematical theory of statistics. In this text the independ- 
ence in the variation of the mean and the standard deviation is 
illustrated graphically in Figure 6-1 which shows the joint dis- 
tribution of these two statistics from 144 samples of 5 cases each 
drawn by students carrying out the sampling project. Without 
this independence between X and s (or з?) from the same sample 
several of the procedures about to be described would not be 
correct. It is indeed a beneficent principle which very few people 
would recognize intuitively. Statistics from two or more random 
samples are always independent regardless of the nature of the 
population which is being sampled. 

Degrees of Freedom. The concept of degrees of freedom in 
connection with x? was discussed in Chapter 4. It also arises in 
connection with observations made on a continuous scale and the 
correct choice of the number of degrees of freedom is often a very 
important part of the correct solution for a problem. 


Degrees of Freedom · 131 


Scale of x 


70-74 
75-79 
85-89 
95-99 
ө | 105-109 
110-114 
125-129 
155-159 
ГРЕЕТ ГЕ [160164 


oo c O > 9144 


Scale of $ 


Fic. 6-1. Joint frequency distribution of mean and standard deviation from 
144 samples of 5 cases each. 


r= —.02 

If N independent observations are made, these observations 
have N degrees of freedom, since each observation is free to take 
any value in the possible range of values. If those N observations 
are from a normal population and if from them a mean and a 
standard deviation (or a variance) are both computed, then these 
N degrees of freedom are partitioned so that the distribution of 
the mean has 1 degree of freedom and the distribution of the 
standard deviation (or the variance) has N — 1 degrees of free- 
dom. In succeeding chapters the N degrees of freedom of N 
independent observations will be partitioned in various ways 
among independent statistics computed from those observations. 


132 - Sampling Distributions 


Such partitioning of degrees of freedom is possible only when the 
statistics possess independence. 

Five Important Distributions. While the number of forms of 
sampling distribution is almost endless, there are five forms (bi- 
nomial, normal, chi-square, “‘Student’s,” and F distribution) which 
are used so often that they are tabulated in practically every text- 
book dealing with statistical inference. Three of these, the bi- 
nomial, the normal, and the chi-square distribution, have been 
used in preceding chapters of this book. The binomial has already 
been considered at some length and in this chapter we shall take 
a preliminary look at the other four. Before discussing these dis- 
tributions in any detail, it will be helpful to give a general idea 
of the type of statistic which gives rise to each of them. 

Statistics Which Have a Normal Distribution. When the dis- 
tribution of the parent population is normal it can be proved 
mathematically that the sampling distribution of certain statistics 
is also normal. Among such statistics are the mean of a sample 
and the difference between the means of two samples. It will be 
helpful to give a general description of such normally distributed 
statistics and then to show how certain familiar statistics can be 
recognized as belonging to this general class. 

If X; represents a random observation from a normal popula- 
tion and C; is a constant, then X;+ C; X; - Ci, С.Х, and X;/C; 
are all normally distributed. Also if X; and X; are independent 
random observations from a normal population, then X; + X, запа 
X;—X; are both normally distributed. Also the more general 
expression. 


(6.1) Co + СХ, + ОХ +... + СХ 


represents a normally distributed variable if the C’s are constants 
and the X’s are independent random observations from a normal 
population. Since each term in (6.1) is of first degree in X that 
expression is called a linear function of X or a linear function of 
the observations. 

The mean of a sample can be obtained from (6.1) by letting Co 
equal 0 and letting Су, C; - - - Cy each equal 1/N. Hence the mean 
of a random sample from a normal population is normally dis- 
tributed. 

An interesting and important fact is that even if the popula- 
tion is not normally distributed, the sampling distribution of the 
mean may often be well approximated by the normal distribution 


Statistics Which Have a Chi-square Distribution · 133 


if samples are large (say N >30). Under such circumstances, 
the distribution of the mean can be treated as normal with mean 
и and standard error o/VN. 

If X is normally distributed, the statistic X — и, obtained from 
X by subtracting a constant, is also normally distributed with 
mean 0 and standard error с/у. Then the statistic 


X-u (Х- ШУХ 
o/VN т 


has the unit normal distribution. In Chapter 7 a normal distribu- 
tion related to the difference between the means of two samples 
will be considered. In Chapter 10 a regression coefficient will be 
treated as a linear function of the observations and as therefore 
having a normal distribution. 

Statistics Which Have a Chi-square Distribution. If X; is 
normally distributed then certain statistics of the form 
(6.2) CX — X) + ОХ, - X) - ++ + бы(Ху - X} 
have the chi-square distribution. Here the C’s are constants and 
deviations from the mean are squared. 

A very important statistic of this type is one in which all the 
C's have the same value, 1/0. This statistic 

z(X,- Xp» (М 1)8? 
о? - с? 

will be discussed in Chapter 8. It has N — 1 degrees of freedom. 
When the subject of analysis of variance is discussed, the X’s will 
often be replaced by the means of subsamples. 


The statistic 2 hs к= treated in Chapter 4 is similar to (6.2) 


because for large samples (f;— P;) has a distribution which is 
approximately normal with mean zero, like that of (X — X), and 
C; may be taken as 1/F;. Consequently the chi-square distribu- 
tion was found to be a satisfactory approximation to the distri- 
bution of this statistic unless the number of observations is small. 

Several classes of students in a statistics course have drawn 
random samples of 5 cases each from the data of Table XXIV 
as described at the beginning of this chapter. The statistic 
У(Х — X)/e? was computed from each of 125 of these samples 
and the results recorded in Table 6.1. Figure 6-2 shows the dis- 
tribution of these empirical values with the theoretical x? distri- 


134 - Sampling Distributions 


bution for 4 degrees of freedom superimposed. Read a few values 
irom Table VIII, and mark their position on the baseline of the graph, 
as for example у? = 1.1, 


12 Хы = 3.36 and x29 = 7.8. 
© A From each of the random 
FRU samples you have yourself 
ze drawn compute the same 
Ем statistic and mark its posi- 

3 tion on the baseline. 

012324528 7 8 9 101112 If the same statistic were 
Value of X 2 


= computed for each of the 
Fic. 6-2. Distribution of D(X — X)2/e? mples of 20 
in 125 samples of 5 cases each, and theoreti- сор, 9n °з, din р o us ld 
cal x? distribution with 4 degrees of freedom. Cases, its distribution wou 
have 19 degrees of freedom 


instead of 4 and would be considerably less skewed than the one 
illustrated in Figure 6-2. 


TABLE 6.1 Distribution of Z(X — X)2/a? in 125 Samples of 5 Cases Each 


= 2 =) 2 »> -— 2 
сс f om% 2 1 cum. % En f cum. % 

17.2* 1 100% 11.7 95.2 5.7 6 744 
16.7 1 99.2 11.2 2 952 5.2 6 696 
16.2 98.4 10.7 93.6 4.7 10 64.8 
15.7 1 98.4 10.2 93.6 4.2 8 56.8 
15.2 97.6 9.7 2 93.6 3.7 7 50.4 
14.7 97.6 9.2 3 92.0 3.2 4 448 
14.2 1 97.6 8.7 2 89.6 2.7 16 41.6 
13.7 96.8 8.2 4 88.0 2.2 8 28.8 
13.2 96.8 TA 2 848 1:7 11 224 
12.7 1 96.8 7.2 5 83.2 1.2 7 13.6 
12.2 1 96.0 6.7 & 192 <b 7 8.0 

6.2 2 76.0 2 3 2.4 


* Middle of the interval. 


The cumulative x? distribution with 4 degrees of freedom is 
drawn in Figure 6-3 and the points determined by the cumulative 
percents from the empirical distribution of Table 6.1 are drawn 
by heavy dots. The cumulative distributions show very strikingly 
the correspondence between the empirical distribution obtained 
from values computed by many different persons from random 


samples they had drawn and the x’ curve plotted from a mathe- 
matical formula. 


Statistics Which Have " Student's ^ Distribution - 135 


1.0 
d Е „>= 
2.8 Gcr HR 
© т. ^| 
а 
2 Е : 
9 .6 = 
а. 
co Q 
2.4 
аз 
5 2 Е 
B2 
1 


o 
= 


2 3 4 5 6 7 8 9 о 18142 
Value of Z (x-x)/o? 


Tia, 6-3. Cumulative x? distribution with 4 degrees of freedom and cumulative 
percents of distribution of 125 values of 2(X — X)?/c? from samples of 5 cases. 


Statistics Which Have ‘‘Student’s” * Distribution. In mak- 
ing inferences about the mean of a population it is necessary to 
take into account the standard deviation of that population. In 
most practical situations neither р nor ø is known and their es- 
timates X and s must be used. 

In Chapter 7 it will be seen that an important statistic in 
problems about the mean is 


.G- u)v/N 
8 


which has “Student’s” distribution when the observations аге 
independent and normally distributed. This statistic can be 


(6.3) t 


written as ¢ =~", where 


8 

6.4 вы 

dr ER YN 

is a sample estimate of the standard error ¢/N. In order that 
the { ratio shall have “Student’s” distribution the numerator must 
be a normally distributed variable with mean zero and the de- 
nominator must be the square root of a variable which is distri- 


* “Student” was a well-known British statistician named William Sealy Gosset 
(1876-1947) who was adviser to the Guiness brewery in Dublin. A ruling of that firm 
forbidding their employees to publish the results of research was relaxed to allow him 
to publish mathematical and statistical research under a pseudonym. His paper on 
“The Probable Error of a Mean,” published in Biometrika in 1908, which first called 
attention to the fact that the normal curve does not properly describe the distribution 
of the ratio of mean to standard error in small samples, is now a classic. 


136 - Sampling Distributions 
buted as x? independently of the numerator. To illustrate the 
difference in the distributions of (x Il and ex = 248 


Table 6.2 has been prepared. For 338 random samples of 5 cases 
each obtained in the sampling experiment, the statistic 


X-u Х- 1216 
o//N 16.61 


was computed and the results tabulated in the second column of 
Table 6.2. For 459 samples the statistic 


X-u (X-1216)v8 
s/ VN 8 


was computed and the results tabulated in the third column. A 
glance at the table is enough to reveal that the latter distribution 
has a much wider spread and that extreme deviates are more fre- 
quent. The cumulative percents for the two distributions are 
shown in columns 4 and 5 of Table 6.2. 

A cumulative normal curve has been drawn in Figure 6-4. 
(This is easily done from the figures in Table I or Table II in the 
Appendix.) The cumulative percents for the distribution of 


Cumulative Probability 


742-3.8-3.0-24.-18 -12 4.6 0 .6 12 1.8 24 30 3,6 42 4.8 


Ета, 6-4. Cumulative normal distribution with cumulative percents of dis- 
tribution of (X — р) М/о shown by dots and cumulative percents of distribution 
of (X — y)V5/s by small crosses. 


Statistics Which Have " Student's ” Distribution - 137 


TABLE 6.2 Distribution of from Samples of 


peur ee and of (X= ane 


5 Cases Each 


Percent of frequency below 


Mine (Х S 0 ae WE upper limit of interval 
Interval — — (© -avs (X-uvs5 
c 8 
4.5 1 100.0% 
4.2 99.8 
3.9 99.8 
3.6 5 99.8 
3.3 1 98.7 
2.0 3 98.5 
2.7 6 97.8 
2.4 1 6 100.00% 96.5 
2.1 6 10 99.7 95.2 
1.8 7 7 97.6 93.0 
1.5 15 14 95.9 91.5 
1.2 17 30 91.4 88.4 
9 24 26 86.4 81.9 
6 36 40 79.3 76.3 
3 34 42 68.6 67.5 
0 57 79 58.6 58.4 
-8 38 49 41.7 41.2 
= :6 35 29 30.5 30.5 
— 19 16 35 20.1 24.2 
—12 25 25 15.4 16.6 
-15 22 12 8.0 11.1 
— 1.8 4 11 1.5 8.5 
— 2.1 8 8 6.1 
— 24 5 3 4.4 
—27 1 4 3 3.3 
— 3.0 3 2.4 
— 3.3 4 1 
— 3.6 1 9 
— 3.9 2 6 
— 4.2 1 2 
Toran 338 459 
Man 0.007 0.016 
VARIANCE 0.84 1.63 


LOVABIANCH S И АВА ы I Er iE Ее 
(X — 121.6)/16.61, listed in column 4 of Table 6.2, are represented 
by heavy dots which cluster closely around the cumulative normal 
curve. The cumulative percents for the distribution of 


(X — 121.6)V5/s, 


138 - Sampling Distributions 


taken from column 5, are represented by small crosses. "These 
depart considerably from the normal curve at the ends of the dis- 
tribution, and it is the ends of the distribution which are strategic 
in most problems. 

In Chapter 7 there will be a discussion of the theoretic distri- 
bution to which these points do conform. It is “Student’s” 


Cumulative Probability 


—4.8 -4.2 -3.6 -3.0-2.4 -1.8-1.2 -6 0 .6 12 18 2.4 3.0 3.6 4.2 4.8 


Fre. 6-5. Cumulative “Student’s” distribution with 4 degrees of freedom, 
cumulative percents of distribution of (Y — и)М5/о shown by dots, and cumula- 
tive percents of distribution of (X — y)V5/s by small crosses, 


distribution with М — 1 degrees of freedom, and for these samples 
N —1-4. The cumulative curve for this distribution has been 
drawn in Figure 6-5, with empirical values shown by dots and 
crosses as before. Here the crosses representing points on the 
distribution for (X — 121.6) v/5/s conform closely to the theore- 
tical curve and the dots do not. 

If the statistic (X — u)v/N/c were computed for each sample 
of 20 cases, obtained by combining 4 samples of 5 cases, its dis- 
tribution would also approximate the unit normal distribution 
and its cumulative distribution would differ only by chance from 
that shown by dots in Figures 6-4 and 6-5. 

However if the statistic (X — ШУЛ /s were computed for each 
sample of 20 cases, its distribution would be far more nearly 
normal than that obtained from the samples of 5 cases, and its 
cumulative distribution would differ considerably from that shown 
by the crosses in Figures 6-4 and 6-5. 


Statistics Which Have " Student's ^ Distribution - 139 


The ratio of the difference between two means to the sample 
estimate of the standard error of that difference has under certain 
circumstances Student's" distribution. Problems employing this 
ratio will be discussed in Chapter 7. The statistic (X1 — Xs) /sg, y, 
was computed on line 4 of Worksheet III. Under the circum- 
stances in which these samples were obtained, this ratio should 


1.0 


9 1 
8 - 
2 2 
A 
ЕК 
8. 
a 
g5 
S4 
5 
5.3 
o 
2 с 
1 
0 
—3.0 —20 -10 0 1.0 2.0 3.0 
Value of t 


Fic. 6-6. Cumulative “Student’s” distribution for 8 degrees of freedom and 
cumulative percents of distribution of = (Xi — X:)/sz,-x, for 200 pairs of samples 
of 5 cases each. 
have "Student's" distribution with 8 degrees of freedom. It has 
been computed for 200 pairs of samples and the results recorded 
in Table 6.3 and graphed in Figure 6-6. 


TABLE 6.3 Distribution of t = (X; — X:)/sz,_y, for 200 Pairs of Samples 
of 5 Cases Each 


X-X.: Хх, -Х, X-X. 

8-22 f 5-23 f 82,2 f 
4.1-4.3 1 14- 1.6 5 —13-—14 п 
3.8-4.0 1:1- 1.3 12 — 1.6-— 1.4 6 
3.5-3.7 Ц 8- 1.0 14 —19-— 17 4 
3.2-3.4 1 о Т7 16 — 2.2-— 2.0 3 
2.9-3.1 1 2 4 16 — 2.5-— 2.3 5 
2.6-2.8 zs c 35 — 2.8-— 2.6 1 
2.3-2.5 1 = 4-— 2 27 — 3.1-— 2.9 1 
2.0-2.2 6 -4-—.5 17 — 3.4-— 3.2 
1:7=1.9 4 =10-— 8 11 — 3.7-— 3.5 1 


N = 200 МЕАм = 0 VARIANCE = 1.37 


140 - Sampling Distributions 


The ratios of other normally distributed statistics to the sample 
estimate of their standard errors will be considered in later chapters. 

Statistics Which Have the F Distribution. Whenever there 
are two independent estimates of c?, the ratio of one estimate to 
the other 


(6.5) = 22 


has a sampling distribution of the form sometimes called the F 
distribution or the variance-ratio distribution. The preceding 
statement should be qualified by the usual assumption that ob- 
servations are drawn at random from a normal universe. Some- 
times the two estimates come from independent samples. Then 
the problem may relate to a comparison of the variability in two 
populations. Chapter 8 will contain problems of this nature. 
Sometimes the two estimates are independent statistics obtained 
from the same set of observations. А great variety of important 


TABLE 6.4 Distribution of F = 52,/5°, for 400 Pairs of Variances, Each 
Variance being Obtained from a Sample of 5 Cases 


Number Cumu- Number Cumu- Number Cumu- 
д of lative | F* of lative | F* of lative 
samples 96 samples % samples % 
18.0 1 100.00 | 12.0 97.75 | 6.0 93.25 
177 99.75 | 11.7 1 97.75 | 5.7 5 93.25 
17.4 99.75 | 11.4 97.50 | 5.4 1 92.00 
17.1 1 99.75 | 11.1 97.50 | 5.1 2 91.75 
16.8 99.50 | 10.8 1 97.50 | 4.8 4 91.25 
16.5 99.50 | 10.5 97.25 | 4.5 3 90.25 
16.2 1 99.50 | 10.2 1 97.25 | 4.2 6 89.50 
15.9 99.25 | 9.9 97.00 | 3.9 11 88.00 
15.6 99.25 | 9.6 97.00 | 3.6 9 85.25 
15.3 1 99.25 | 9.3 97.00 | 3.3 7 83.00 
15.0 99.00 | 9.0 2 97.00 | 3.0 11 81.25 
14.7 2 99.00 | 8.7 96.50 | 2.7 12 78.50 
14.4 98.50 | 8.4 96.50 | 2.4 22 75.50 
14.1 1 98.50 | 8.1 1 96.50 | 2.1 17 70.00 
13.8 98.25 | 7.8 2 96.25 | 1.8 22 65.75 
13.5 98.25 | 7.5 1 95.75 | 1.5 27 60.25 
13.2 98.25 | 7.2 95.50 | 1.2 35 53.50 
12.9 1 98.25 | 6.9 2 95.50 9 57 44,75 
12.6 98.00 | 6.6 4 95.00 6 76 30.50 
12.3 1 98.00 | 6.3 3 94.00 8 46 11.50 


* Values of Ё are the upper limits of step intervals. 


Probability Density 


Statistics Which Have the Е Distribution - 141 


= e о Кы о 
чае я БА 
ЇЇ ч N = о 
ө тай l J 
е6, о o o 
ut 4 ^ а Д 
цш. шц. uw [Т] 


13.5 
Value of F 


Fic. 6-7. Distribution of F = s*;/s*; for 400 pairs of samples of 5 cases each, 
and theoretical F distribution for 8 degrees of freedom. 
(7 cases at upper end of distribution are not shown on graph.) 


problems ean give rise to such variances, one of the commonest 
being a test of the hypothesis that several populations have the 
same mean. This problem is discussed in Chapter 9. 

Let m represent the degrees of freedom associated with the 
numerator of F and л» the degrees of freedom associated with the 


Cumulative Probability 


Circles indic: 


retical Values of 10th, 25tl 


E 50th, 90th, 95th and 97.5th ПЕШ Points 
0 


1 


0 1.5 3.0 4.5 6.0 


7.5 9.0 105 120 13.5 


Value of F 


Fic. 6-8. Cumulative percentage distribution of 400 values of F zs 
and certain selected percentiles of the theoretic Ё distribution. 


142 - Sampling Distributions 


denominator. Then for every pair of values of п, and m; there 
exists a distinct distribution. Since s, and 8%» are both positive 
their ratio is positive and values of F extend from 0 indefinitely 
in the positive direction. 

In the sampling experiment, each of many statistics students 
divided the variance of the first sample he drew by the variance 
of the second, obtaining a value of F. In Table 6.4 are presented 
400 such sample values of F. These are shown graphically in 
Figure 6-7 together with the theoretic F distribution. Certain 
percentile values are indicated along the baseline in case you wish 
to compute F ratios from your own data and compare them with 
this distribution. The cumulative distribution from the empirical 
data of Table 6.4 has been plotted in Figure 6-8. Small circles 
indicate the values of selected percentiles from the theoretic F 
distribution. 


ri Inferences Concerning the Mean or 
the Difference Between 


Two Means 


In this chapter, methods will be described for testing 
hypotheses regarding и and for calculating estimates of м. In 
addition, methods for comparing the means of two populations, 
u and us, will be considered. For these purposes the normal dis- 
tribution and “Student’s”* distribution will be the required 
sampling distributions. 

The Assumption of a Normal Population. In this chapter 
and in most of the following chapters the assumption will be made 
that measures in a sample are independently drawn from a normal 
population. This assumption is justified by the approximately 
normal shape assumed by many empirical distributions. Studies 
indicate that some departure from normality does not invalidate 
the methods to be described. 

Where the assumption of normality or of independence is not 
made this fact will be indicated. Where it seems important to 
state explicitly the assumption of normality it will be stated. 

The Central Limit Theorem. A theorem of far-reaching 
сз is called the Central Limit Theorem. 
It states that for a wide variety of populations statistics based on 
large random samples are distributed normally. This applies to 
nearly all populations which are likely to be considered in practice, 
and to most statistics considered in this book. This is especially 
true of statistics to be considered in this chapter. It is less true of 
the correlation coefficient to be considered in Chapter 10, and of the 
sample proportion p when P is near 1 or 0, because their distribu- 
tions depart greatly from normality unless the sample is exception- 
ally large. Also if classes are few, distributions of x? and F ap- 
proach smooth x?-distribution, not normal, as sample size increases. 


* See footnote on page 135. 


importance in statisti 


144 - The Mean or the Difference Between Two Means 


X as an Estimate of u. It was stated in Chapter 5 that 
E(X) = ш and that, therefore, X is an unbiased estimate of p. 
It is easy to see that X is also a consistent estimate of u, for 

с 

(7.1) ox VE 

and so ох decreases as N increases. For large samples X can be 
considered as normally distributed about и with standard devia- 
tion ox even if the population has a non-normal distribution. 
Even for relatively small samples out of a non-normal population 
the mean has a distribution which is approximately normal. Since 
ox can be made arbitrarily small by making N sufficiently large, 
the probability that X deviates from џ by a given amount can 
also be made arbitrarily small. The argument for the consistency 
of X corresponds to the argument for the consistency of p given 
on page 52. 

There are many other possible methods of estimating и, such 
as using the median, or the mode, or the average of the 25th and 
75th percentiles, or the average of the 7th and 93rd percentiles, 
or the midpoint of the range, etc. In samples from a normal 
population every other estimate of u has a standard error larger than 
сх. Since the mean is the estimate of u which has the smallest 
standard error in samples from a normal population, it is called 
the efficient estimate of u for such a population. When the popula- 
tion is not normal some other estimate such as the median may 
have a smaller standard error than the mean and so be more 
efficient. Thus X provides an estimate of и which is unbiased, 
consistent, and efficient when the population is normal. 


EXERCISE 7.1 

1. For each of the given values of N and c, find the value of ох. Маке 
a table of results and notice the way in which ох decreases аз № increases, 
т remaining constant. Notice also the way in which ох increases as 
7 increases, N remaining constant. 


a.o=5,N=4 d. c = 5, N = 10,000 8. с = 10, N = 25 
b. с = 5, № = 25 е. с= 2, М = 25 h. o = 20, N = 25 
с. с = 5, N = 100 Ё с= 4, М = 25 


2. Suppose in a population д = 48.5 and о? = 15.4. What is the 
probability that the mean of a sample of 20 cases will be larger than 50? 
Between 47 and 50? Answer the same questions if N = 100. 

Tests About и When с Is Known. In practice this situation 
is not very common for usually if д is unknown о is also unknown. 


“ Student's ” Distribution - 145 


However it is the simplest of the situations to be considered and 
a natural introduction to the others. The population of 447 cases 
used in Chapter 6 provides a good illustration. Here џ = 121.6 
and o = 37.15. Suppose a student reports obtaining a sample 
of 5 cases in which X = 165.4. This seems very large. If his 
computations are not available for examination shall it be as- 
sumed that he has made an error? 
Since the population variance is known, ox = a/VN and 


„Х-и a YNE n) 
c/N/N c 
Applying Formula (7.2) to the reported data we have 


/5(165.4 — 121.6) _ 
Z= УТЕ СОЙ = 2.64 


Tables of normal probability indicate that the area outside the 
two ordinates at 2 = + 2.64 is .008. The hypothesis that 165.4 
is a purely random deviate from the expected value of 121.6 is 
scarcely tenable and it would be wise to examine the student’s 
work to see whether he has misunderstood the procedure or made 
a mistake in computation. However it must be noted that if a 
thousand samples are drawn about 8 of them might be expected 
to have | z | > 2.64 and one should not be too confident that in 
any particular sample a large departure from the expected value 
is due to a mistake. 

“Student’s” Distribution. When c is not known, сх must be 
estimated from sample data by the formula 


(7.2) 2 


8 

7.3 sz = —= 

ү dE 

If observations are drawn from a normal population, the ratio 


2 has what is called Students" Distribution, and is cus- 


tomarily denoted by the letter ¢ to distinguish it from 2 which 
customarily denotes a variable with unit normal distribution. 
ja Xu VN(X — и) 

s/N/N. 5 
“Student’s” distribution is sometimes called the t-distribution. 
Because this chapter presents the first situation in which “Stu- 


dent’s” distribution is needed, it is necessary now to pause for a 
discussion of the form of that distribution and the tables available. 


(7.4) 


146 - The Mean or the Difference Between Two Means 


“Student’s” distribution is symmetrical with maximum ordi- 
nate at (— 0. Its shape changes as the number of degrees of 
freedom changes. Therefore “the” curve is in reality a whole 
family of curves. As n increases, the curve approaches the normal 
form so that for » as large as 30 or 40 the normal probability 
table can be used satisfactorily in most problems. By calculus 


1.0 


E 
N 8 
7 
6 
5 
4 
3 
2 
$ $ E 
xN 1 
с Е =].<.0,- A ee 3504 4.0 3.0 2.0 1.0 0 1.0 2.0 3.0 4.0 
Fig. 7-1. The normal distri- Fre. 7-2. Cumulative normal distri- 


bution (N) and "Student's" dis- bution (N) and cumulative curve of 
tribution with 1 degree of freedom — "Student's" distribution with 1 degree 
(5). of freedom (5). 


it can be shown that the formula for the normal curve is the limit 
approached by the formula for "Student's" distribution as n 
becomes very large. When n is very small the area in the tails of 
"Student's" distribution is considerably greater than the cor- 
responding area in the tails of the normal distribution, as can be 
seen in Figures 7-1 and 7-2 where the normal distribution and 
"Student's" distribution with n = 1 are shown together. The 
same relation has already been seen in Figures 6-4 and 6-5. 

In Table IX to be found in the Appendix, each row belongs to 
a different probability curve. These curves are distinguished by 
the number of degrees of freedom specified in the column at the 
left. The tabular entries are values of t. In the column headed 
to each entry is the 90th percentile of the probability distribu- 
tion to which it belongs. A similar description applies to the other 
columns. Notice that when the subscript for ¢ is less than .50 the 
value of ¢ is negative. Thus (и = Ёд, toos = — tss, etc. The 
entries in the last row of the table are percentile values for the 
normal curve. 


EXERCISE 7.2 
1. Study the following statements, translate them into words. Sketch- 
ing a probability curve may help your understanding. 
P(t < tos) = .95 Ptos < t< ty) = .92 
РЕ < toi) = .01 P(tos < t < tio) = 84 


Tests About и when c Is Unknown - 147 


P(t > tæ) = 01 P(t > tm ort < tm) = 02 
P(t > to) = .93 P(t > 45 or t < to) = 


2. Examine Table IX to identify the entries which support the fol- 
lowing statements: 


P(t < 1.4 | n = 8) = .90 P(t > — 1.34 | n = 15) = .90 
P(t < - 2.31 | n = 8) = .025 Р(- 1.37 < t < 1.37 | n = 10) = .80 
P(t < - 1.35 | п = 13) = .10 Р(- 2.18 < < 2.18 | n = 12) = .95 
P(t > 3.75 |n = 4) = .01 Р(- 3.00 < t < 3.00 | n = 7) = .98 
3. Verify the following statements by reference to Table ІХ 
a. For n = 11, P(t > 2.2) = 025 
P(t < 2.2) = .975 


Р(— 2.2 < t < 2.2) = .95 


b. If ordinates are to be drawn to cut off the extreme 1% of the area 
in each tail of the curve 
for n = 5, they should be drawn at — 3.36 and + 3.36 
for n = 10, they should be drawn at — 2.76 and + 2.76 
for n = 25, they should be drawn at — 2.48 and + 2.48 
for the normal curve, they should be drawn at — 2.326 and + 2.326 
c. As the eye follows down any selected column of the table, the 
entries grow smaller as n increases, and they appear to converge 
upon the entries in the last row of the table. 
d. The entries in the row headed л = are identical with cor- 
responding entries in Table II. 

4. If ordinates are to be drawn to cut off the extreme 10% of the area 
under the curve, that is the extreme 5% in each tail, where should they 
be drawn if 

а. п=3 b. п = 16 с. п = 500 

5. If ordinates are to be drawn to enclose the middle 80% of the area 

under the curve, where should they be drawn if 
а. п= 1 b. п = 10 с. п = 400 

6. If an observed value of t has been found to Бе 2.8, what is the best 

reading which Table IX furnishes for P(t < — 2.8 or t > 2.8) when 


а. п= 1 b. п= 3 с. п = 500 
Tests About » when с Is Unknown. Suppose a sample of 
6 cases has X = 12.3 and s?= 4.8. It is desired to test the hy- 
pothesis и = 15 with a two-sided test at level a = .05. The sta- 
tistic sy = s//N must be used. 
гы (X -vN 
8 


148 · The Mean or the Difference Between Two Means 


will have "Student's" distribution with 5 degrees of freedom. 
Substituting the given data we have 


к _ (12.3 - 15)v6 | 
V4.8 
To complete the test of significance the observed value of 
t = — 3.02 must be compared with tabulated entries in Table IX, 
where the critical region for the two-sided test with a = .05 is 
seen to be {< — 2.57 апа! > 2.57. The observed value falls in 
this critical region and the hypothesis is rejected. 

Confidence Interval for и.“ Formulas for calculating con- 
fidence intervals for ш will be developed both for the situation 
when c is known and when it is unknown. The confidence coeffi- 
cient will be taken as 1 — о where о is some small value such as 
:01 or .05, so that 1 — æ is .99 or .95. 

Consider first the situation when ø is known. Then the fol- 
lowing relation holds: 


Р(в.< (X= шуу 


(7.5) CROWN ca ,)-1-а 


— 3.02 


when X is normally distributed or when X is not normally dis- 
tributed but N is large. Simple algebraic manipulation on the 
inequality in parentheses yields another expression with the same 
probability: 


с с 

(7.6) Р(Х +a e и< +) ee 

The inequality within the parentheses provides confidence 
limits for и which are appropriate under the conditions stated, 
namely that X is normally distributed and с is known. Written 
with the symbol X occurring in the limits of the interval, it is 
readily understood that д is a fixed, though usually unknown, 
value and that it is the limits of the interval which are subject to 
sampling variation. It should be already understood that the 
probability given in (7.6) is the probability that X for a ran- 
domly selected sample will simultaneously satisfy the two in- 
equalities 


Xr < and Xa. A > Hs 


* Before studying this section, the reader may find it profitable to reread: the 
discussion on confidence intervals in Chapter 3, 


Confidence Interval for и. 149 


After a sample has been drawn, X computed from it, and the two 
limits obtained in numerieal form, such as, say, 17.3 and 22.5, 
then the inequality 17.3 < u < 22.5 is either true or not true. The 
probability of (7.6) does not apply to it. However the probability 
statement of (7.6) provides us with a measurable degree of con- 
fidence that such a pair of values obtained from a sample will 
contain и in, say, 95% of all determinations. We may express 
that degree of confidence by such a statement as 


C(17.3 < u< 22.5) = .95 


When c is unknown, a formula analogous to (7.6) is written with 
s in place of о, and t in place of z: 
8 8 
(7.7) Р(Х + E u< Хит) -1-< 
The number of degrees of freedom associated with ¢ is № — 1. 

To help the reader develop a more vivid concept of the fluc- 
tuation of the interval limits from sample to sample, data from 
the four samples enumerated on Worksheet I in Chapter 6 have 
been used to compute four confidence intervals with confidence 
coefficient .95. These interval estimates are shown in the last 
column of Table 7.1 under the heading 1 — a = .95. In this case 
и is known to be 121.6 and one can see that it is contained in all 
four intervals. In the same table are also shown intervals with 
confidence coefficient .50 made from the same four samples and it 
may be noted that three of these contain и and one does not. 
Intervals with confidence coefficients .95, .90 and .50 from Table 
7.1 and intervals with confidence coefficient .10 not in that table 
are shown graphically in Figure 7-3. In each group the four 
unbroken lines represent intervals obtained from the four samples 
with N =5 and the dotted line represents an interval obtained 
from the composite sample with N = 20. The small tick in the 
middle of each line represents the value of X. Examining this 
graph one understands why the confidence coefficient is in prac- 
tice never taken as a small number such as .10 or even .50. 

Examination of Table 7.1 and Figure 7-3 leads to the follow- 
ing interpretations: 

1. If М and s are fixed, the narrower the interval the less assur- 
ance there is that the population value is contained within it. 
(Look at the four intervals for any given sample.) 

2. If М and the confidence coefficient are fixed, the smaller 
the value of s the narrower will the interval be. (Look at the four 


150 · The Mean or the Difference Between Two Means 


unbroken lines in any one group, remembering that for these 
samples $5 € 31 < 84< 8.) 
1—a-.10 1-а«=.50 1—a-.90 1-а=.95 


200 


190 


180 


Estimate of Population Mean и 


Fra. 7-3. Interval estimates of м made with confidence coefficients of .10, 
.50, .90, and .95 from each of the four samples of 5 cases and from the composite 
sample of 20 cases. (Samples as given on Worksheet І.) 


Dotted line is for composite sample. Horizontal tic is mean of sample. 


3. As N increases, if the confidence coefficient is fixed, the 
interval becomes narrower. (In any one group compare the dotted 
line for М = 20 with four unbroken lines for N = 5.) 

4. Samples of 5 cases provide very unreliable estimates. (Note 
that when there is high confidence that the population value lies 
in the interval the intervals are wide and that when intervals are 
narrow there can be little confidence that the population value 
lies in them.) 


Mean of a Population of Differences · 151 


5. The confidence interval for the mean р is symmetrically 
placed around the observed mean X. (The confidence interval 
for the population variance discussed in the next chapter is not 
symmetrically placed around the observed variance.) 


TABLE 7.1 Interval Estimates with Confidence Coefficients of .50, .90, and .95 
Made for the Population Mean и from the Data of the Four Samples of 5 Cases 
and of the Composite Sample of 20 Cases on Worksheet I. 


Interval estimate 
1—a-.50 1-а=.90 1 — a = .95 


Sample x в" s) VN 


1 106.6 43.7 19.5 92.2<и<121.0 65.0<и<148.2 52.5<p<160.7 
2 104.6 59.3 26.5 85.0 <u <124.2 48.1<и<161.1 31.0<и<178.1 
3 126.2 28.5 12.7 117.0<p<135,.4 99.1<и<153.2 90.941 «161.5 
4 1220 517 281 104.9<и<139.1 72.8<и<171.2 57.9 <и<186.1 


ee 114.85 444 9.92 108.0<и<121.7 977<и<132.0 94.2<и<135.5 


6. If the interval were narrowed to the single point и = X, 
its confidence coefficient would be zero. If a confidence coefficient 
of 1.00 were required, the interval would have to be infinitely wide. 


EXERCISE 7.3 

1. From the samples you obtained in the sampling experiment of 
Chapter 6 compute a set of interval estimates by Formula (7.7) and draw 
lines on Figure 7-3 to represent these intervals. You may wish to use the 
same confidence coefficients employed there and thus increase the number 
of lines in each group, or to use another coefficient, say .80. 

2. By substituting observed values of X and y and values read from 
tables of Student's" distribution, verify the interval estimates in Table 
7.1 for confidence coefficients of .95 and .90. 

3. What proportion of all samples is expected to yield interval estimates 
that contain и if 1 — а = .95? Of the 4 interval estimates made from the 
4 observed samples how many did contain и? Answer the same question 
for 1 — а = .90, .50, and .10. 

4. If a sample of 40 cases has a mean X = 27.2 and a variance s? = 13.6, 
what interval estimate can be made for и at the .99 level of confidence? 
At the .95 level? 

Mean of a Population of Differences between Two Measures 
for Each Individual. This is a very common situation in research. 
Sometimes each individual is measured at the beginning and again 
at the end of an experiment and the gain is then treated as the 
basic measure to be studied. Sometimes two subjects are paired 
in such a way that the pair constitutes one individual and the 


152 - The Mean or the Difference Between Two Means 


difference between scores for members of the pair becomes the 
basic measure. An illustration may be taken from Wrightstone’s 
Appraisal of Newer Elementary School Practices? In each of six 
communities he selected two schools one of which he classified 
as “progressive” and one as conservative." А test of knowledge 
of current affairs was given to children in each school and the 
school mean computed, with results as shown here. X, and X, 
indicate respectively the score of a “progressive” and of a “соп- 
servative” school, score for the school being the mean of the scores 
of all the children studied. 


Community XS xe D-X,- X, D? 
A 32.7 28.2 4.5 20.25 
B 33.1 29.1 4.0 16.00 
C 35.2 31.9 3.3 10.89 
D 36.0 33.1 2.9 8.41 
E 36.2 31.4 4.8 23.04 
F 39.1 35.3 3.8 14.44 

Toran 2123 189.0 23.3 93.03 


We must consider a population of communities in each of 
which there is a “progressive” and a “conservative” school for 
which a difference score D can be obtained. Such a population is 
conceptual. It would be an enormously difficult task to enumer- 
ate all the communities in this population and take a sample 
from them. We are interested in the hypothesis that for this 
conceptual population the expectation of all such differences is 
Zero, up = 0. To test it we shall need the observed mean and 
standard deviation D and sp for the six communities in the sample 
and the standard error of D: 


= ZD 
(7.8) D-7 

‚ _ NID: - (=D)? 
(7.9) 8р“ = NO = ry 

_ _ ‚/зь% _ ЛҮ (2D) 
(7.10) sae ve у ые 

is =D 
phe N 


Mean of a Population of Differences - 153 


which can be more easily computed by the equivalent formula 
>р 


/NZD? — (Хр)? 
МЕ 


By Formula (7.11) t= BU CUT PN - 
ee — (28.8)? 
5 


(7.11) t= 
13.3 


We find in Table IX on the row for which n = 5 that {5% = 6.86, 
and the observed value of ¢ is very much larger than this tabulated 
value. Consequently it is almost inconceivable (though not im- 
possible) that the population difference is zero. The statistical 
analysis does not indicate the cause of the difference but indicates 
strongly that the difference is positive and should not be attrib- 
uted to chance and that therefore it has meaning, is significant. 
Naturally a research worker will examine his procedures carefully 
to see if any influence other than the one he is studying could have 
crept into his data and will not assume a significant difference to 
be due to the experimental factor unless there seems no other 
reasonable way to account for it. 

The quotient t=D/sp has "Student's" distribution with 
М — 1 = 5 degrees of freedom. Hence an interval estimate at 
confidence level .99 may be made for the population difference by 
the general plan of Formula (7.7) as follows: 


3.88 — (4.032) (.291) < и < 3.88 + (4.032) (.291) 
2.71 < u< 5.05 


Therefore the mean advantage enjoyed by the “progressive” 
school over the “conservative” with respect to knowledge of cur- 
rent affairs on the part of the pupils may be estimated at some- 
where between 2.7 and 5.1 points. 

Authors do not usually publish the raw scores on which their 
computations are based, although such data would often be of 
great value to other persons working in the same field. Suppose 
a published study gives the mean and variance of a sample at the 
beginning and at the end of an experimental period and also gives 
the correlation coefficient between scores at the beginning and 
at the end but does not give individual scores. The differences 
D; = Ху; — Xs; cannot be computed as was done for the Wright- 
stone data and so their variance cannot be computed directly. 
However the mean and variance of these differences can be ob- 


154 - The Mean or the Difference Between Two Means 


tained from the mean and variance of each set of scores and the 
correlation * between them, as 


(7.12) D=X.-X, 
(7.18) вр? = 81° — 2712818 + 82° 


An illustration may be taken from a study by F. L. Westover.” 
This is a study of three methods of improving reading speed and 
comprehension among college students. For our present purposes 
we shall deal with one small aspect of this study only. For the 
45 students who were taught by the method of controlled eye 
movements the mean and standard deviation on an initial test 
of rate of reading were as given on page 42 of Westover's study: 


Xe 8 
Х, = Initial test 39.20 5.97 
Х, = Final test 43.07 7.85 
D = Gain 73.87 


The coefficient of correlation between the tests was ri» = .51. 
Then sp? = (5.97)? + (7.85)? — 2(.51) (5.97) (7.85) = 49.46 


and — sp? = as = 1.099 
and t= BEN - 3.7 
1.099 


There can be little doubt that use of this method of instruction 
results in improved reading when applied to individuals like those 
in this study. 

Test of Hypotheses About ш — ш. A. When c, and o, are 
Known. Assume that a sample of N, cases has been drawn from 
a normal population with mean ш and variance c^, and another 
independent sample of №, cases has been drawn from a normal 
population with mean д» and variance o%. The two sample means 
X, and X, are distributed with means ш and m and standard 
deviations e;/V/N and o:/VN. The mean difference X; — X» is 
distributed normally with mean ш — us and variance 


2 2 
(7.14) Cox = 1 +9 


NUN 


* Readers who have not become at least superficially acquainted with the cor- 
relation coefficient in an introductory course might defer this section until they have 
studied Chapter 10, or might read it now taking the concept of correlation on faith. 


Test of Hypotheses About ш — д. 155 


Then the quotient 
„_ dt Xs = (n - и) 
NES 
Nit Ns 
has the unit normal distribution. 


When the hypothesis tested is that ш — д» = 0, Formula (7.15) 
reduces to 


(7.15) 


(7.16) z= 


In the special case when the two populations have the same 
variance, ез? = оз? = 0*, the variance of the mean difference be- 
comes 


№, + №, 


and the ratio used to test the null hypothesis ш — m: = 0 can be 
reduced to the form 


eor o pom 
Оз eg Ум м, 


B. When c, and о» are Unknown but Presumed Equal. In a 
study which has as its aim the comparison of the means of two 
populations, both the means and the standard deviations must 
ordinarily be estimated from the samples. То illustrate the type 
of data which may be obtained in such a study consider the fol- 
lowing unpublished data from scores on à mathematics test ob- 
tained from a sample of 12 full-time women students at Barnard 
College and a sample of 11 part-time women students enrolled in 
the School of General Studies at Columbia. The means were 
respectively 82.08 and 72.64, and it is desired to test the hy- 
pothesis that the population means are equal, ш — № = 0. 

Such a test is frequently carried out by using Formula (7.16), 
substituting sample standard deviations for population standard 
deviations, and referring the results to a normal probability table. 
However this substitution produces a statistic in which the de- 
nominator as well as the numerator is subject to sampling error. 

Т the sample variances are not inconsistent with the hypothesis 
that population variances are equal, the test for which will be 
described in Chapter 8, a more satisfactory procedure is to es- 


156. The Mean or the Difference Between Two Means 


timate this common variance and to replace c? in Formula (7.17) 
by that estimate. An unbiased estimate of o? based on data from 
2 samples is provided by 


Ni № 
pl (Xa = X) i (Xi - Xj 
F Nit N2—2 


When raw scores are available, Formula (7.19) may be convens 
iently reduced to 


(3 2 yx 2 
о 
Date ih ee гаа 
(20 t= = 


(7.19) s 


When raw scores are not available but standard deviations or 
variances are, the formula may be written as 

 Qn- Ds? + (М, - 1)s? 

га М, + М» 2 

These various formulas are algebraically equivalent but if raw 
scores are at hand it would be obviously inefficient to compute 8 
and insert it in Formula (7.21) because the routine of (7.20) 
involves less rounding error. After estimating the common popu- 
lation variance by Formula (7.20) or (7.21), the obtained estimate 
is substituted for c? in Formula (7.17) to obtain an estimate of 
the variance of the difference between the two means: 


(7.21) 8° 


№+ N. 
7.22 = S HA 
( ) 8-х, «( М. № ) 
Then when и; — us = 0, the formula for t becomes 
(7.23) pee eee and this 


has "Student's" distribution with №, + N, — 2 degrees of free- 
dom. 

Now let us return to the problem concerning the comparison 
of Barnard and general studies students. For each group we need 
to know the values of №, ХХ and ХХ?. These data are 


N д2, 2х? Х 
Barnard group 12 985 83691 82.08 
General studies group 11 799 62149 72.64 


Test of Hypotheses About ш — ш. 157 


Substituting these numbers in Formula (7.20) gives 


2 
83691 — (985)* + 62149 — (799 
= 1? И - 331.05 
12+11-2 1 
From Formula (7.28) we then have 
: 82.08 — x EN 
V зао у an) 


Since 4 = 1.24 is well within the customary region of accept- 
ance, we accept the hypothesis ji = Hs and say the Barnard 
students have not been shown to differ from general studies 
students in respect to scores on this arithmetic test. It must 
however be noted that when samples are small and variability 
large, the observed difference must be very large to appear signif- 
icant. The failure to find a significant difference may be due to 
the small number of cases examined rather than to the equality 
of population means. 

C. When c, and о» are Unknown but Presumed Unequal and 
Samples are Large. If the test for equality of the variance to be 
described in Chapter 8 has been applied and indicates that it is 
not reasonable to consider the two populations as equally varia- 
ble, then estimating a common variance as described in the pre- 
ceding section would be inappropriate. In this situation the choice 
of the method for estimating the standard error of the mean dif- 
ference depends upon the size of the samples. 

If the samples are not small, say 30 or more cases in each, the 
standard error of the mean difference may be computed as 


EE 
(7.24) 8х, х, = Ni + м, 
Xi ES Xi = (ш = из) 


(7.25) Then fe Vins 
NN 


may be referred to a table of normal probability because n is so 
large that the statistic has a distribution approximating the 
normal. 

D. When c, and оз are Unknown but Presumed Unequal and 
Suppose samples of size М, and N: are drawn 


Samples are Small. i 
h unknown means and variances, 


from normal populations wit 


158 - The Mean or the Difference Between Two Means 


and suppose it appears unreasonable to assume that т = оз. (А 
test for the hypothesis с = о» is given in Chapter 8.) To test the 
hypothesis ш = д» set ш — us = 0 in Formula (7.25) and the re- 
sulting statistic has Students" distribution with degrees of free- 
dom given by the expression 


n = ——5 UR 
mL T " &*\2 I 
NJ Ni +1 No} №, + 1 
The value of » given by this formula will seldom be an integer, 


but usually a good approximation can be obtained by using the 
nearest integer. 


(7.26) 


Example: Suppose the sample data are 
№; = 10, X; = 35.3, 81° = 8, а = .05 
М» = 8, X: = 31.1, s? = 24 


ов 
Then MT .8 and NM 3.0 
35.8—3L1 4.20 
= ———— = —— = 29 
ne igg = 2.156 
2 
ects 129 198 5 ag 


TOTOE 
bal, АКОН 


If n be taken as 11, the region of acceptance is — 2.20 < t < 2.20; 
if n be taken as 12, the region of acceptance is — 2.18 < ¢ < 2.18; 
and in either case the statistic falls in the region of acceptance. 
Interpolation is unnecessary to reach a decision. 

Formula (7.26) is an approximation to a longer formula given 
by Welch in the reference at the end of this chapter. 

Е. When ш and из are Themselves Differences. Sometimes а re- 
search worker has a problem in which he must compare the dif- 
ference of two means for group A with the difference of two similar 
means for group B, the two groups being independent. 

If the difference between the two means for one group is based 
on two measures of the same individual, the difference can be 
treated in the manner described on page 151. Thus if the problem 
were to find whether children taught by Method I gained more 
than children taught by Method II during a specified period, the 
first step would be to find the gain for each child, as the difference 
between his initial and final scores. Suppose that difference is 


Test of Hypotheses About ш. — и». 159 


called D. Then Xp and sp are computed for each group. The 
hypothesis up, — up: = 0 is then tested by the methods described 
in sections A, B, C, or D, whichever is appropriate. 

Consider, however, the situation in which no pairing of scores 
is possible. Suppose a difference between sexes has been found 
at one age level and another difference found at another age level. 
To formalize the statement of the hypothesis, it will be assumed 
that there are four populations, one at each age level for each sex 
and that each population is normally distributed. It will be 
further assumed that each population has the same standard de- 
viation, ø, but that the four means may possibly be different. 
For convenience we shall call the four populations A, B, C and D 
as in the adjacent display. Suppose now that a random sample 
is taken from each population. These samples need not be of the 
same size. For each sample the statistics ZX and ZX* are com- 
puted, as in the adjacent display. 


TABLE 7.2 Statistics Required for Comparison of Two 
Mean Differences 


Sample statistics 


Population Population values SEES 


. Male, aged 10 HA, © Na, ZX4, ХХ 


. Female, aged 10 ив, в Ng, ХХв, ZX!p 
. Male, aged 16 с, с Ne, ZXo, EX'c 
. Female, aged 16 Ир, в Np, ZXp, ХХ 


An estimate of о? based on raw data is given by 


(7.27) 
DX)? (ZX)! (ZXo! (Хр)? 
6 DX% + DX2s + ХХ + DX — ( fa =i Ne - Me = NT 
a Nat Ne+Ncet+No-4 


If raw data are not available but the variances are, then the same 
result can be obtained by 

(Na — Ds4 + (Ns — 1)s25 + (Nc = 1)з% + (№ — 1s, 
а Na Ns No Np - 4 
If the hypothesis to be tested is that (ua — ив) = (uc — шо) then 
the statistic 


(Xa — Xs) = (Xo — X5) 
(7.29) & 
cme ту кш 


160 · The Mean or the Difference Between Two Means 


has ‘‘Student’s” Distribution with Na + Ns + Nc + Nb - 4 de- 
grees of freedom. 

A comparison of differences of means will be treated in Chapter 
14 under the topic of interaction. 


EXERCISE 7.4 

1. Terman and Miles in Sex and Personality ?? give masculinity ratings 
for a group of gifted boys with intelligence quotients of 140 and above and 
for a control group of unselected boys, as follows: 


N x 8 
Gifted boys 290 15.22 1.45 
Controls 161 14.90 1.49 


Do the data give warrant for asserting that gifted boys tend to have higher 
masculinity ratings than others? 

2. In the study by Westover *! previously quoted the following data 
were obtained for the 45 students taught by the use of special practice 
exercises: 


x 8 
Initial score on vocabulary test 55.96 4.92 Tig = .75 
Final score on vocabulary test 57.44 7.56 


Test the hypothesis that increase in mean score is due to chance. 

3. Long" reports a study of a comparison of the motor abilities of deaf 
and hearing children. Deaf subjects were children in residence at the 
Institution for the Improved Instruction of Deaf Mutes. These were 
paired with hearing children of the same age and sex. No subjects in 
either group had known defects other than deafness which might affect 
motor performance. Seven tests of motor discrimination were given to all 
subjects and the results analyzed separately for the boys and the girls. 

In Table 7.3 is the list of raw scores for 10 out of 37 girl pairs on 
a test of balance and on a test of grip for the average of the two hands. 
Scores of deaf children are marked D and scores of hearing children 
marked Н. 


TABLE 7.3 Scores on Test of Grip and Test of Balance by 10 Deaf Girls (D) 
and 10 Hearing Girls (H) Paired on Basis of Age* 


Pair Score on grip Score оп balance Pair Score on grip Score on balance 


D Н р Н р Н р Н 
1 25 26 2.0 2.3 6 48 51 17 4.3 
2 22 22 2.0 1.0 7 49 42 2.0 47 
3 28 29 2.7 3.7 8 54 54 2.0 7.0 
4 35 39 2.7 3.3 9 65 77 2.7 3.3 
5 37 34 3.0 10.0 10 57 68 1.0 ET 


* Data from Long. 


Alternatives to a Hypothetical Mean · 161 


For each trait make the appropriate computations and test the hy- 
pothesis that the difference between deaf and hearing girls is zero. 


Alternatives to a Hypothetical Mean. Suppose that a popu- 
lation can be assumed to be normal and to have a standard de- 
viation, с = 10, but that the mean р is unknown. Suppose the 
hypothesis is formulated that р = 50, and is tested by a sample 
of 25 observations. The critical region is constructed at a signif- 
icance level of .05 and the assumption that и is actually 50. Since 
c/N/N is 2 and 1.960/V. № is 3.92 the critical region consists of all 
values of X less than 46.08 or greater than 53.92. 

If the hypothesis, и = 50, is true, the probability is .05 that 
X will have a value in the critical region. But what is the proba- 
bility that X will have a value in the critical region when и has 
some value other than 50? Consider the three alternatives 
u = 48, и = 52, and и = 56. If either of these alternatives is true 
this alternative will be the actual mean of the distribution of X. 
The standard error is unchanged for these alternatives since c 
is assumed known. The shaded areas in Figure 7—4 represent the 
probabilities that X will fall in the critical region if each of the 


44 46 48 50 52 54 56 58 60 62 
і , 


1-48 


H:u=50— "M. db 48 БО 52 54 56 58 60 62 
| 


44 46 48 50 52 54 56 58 60 62 
i р 
! 
І 
І 
1 


As 44 46 48 50 52 54 56 58 60 62 


Fic. 7-4, Distribution of X when М = 25, с = 10, and д = 48, 50, 52, or 56. 
Under Н: р = 50, critical region with @ = .05 is X < 46.1 and X > 53.9 — 
Shaded area indicates probability that X will fall into critical region. This 

probability is 
ATifg = 48 ATdfu = 52 
105 if u = 50 185 if u = 56 


162 - The Mean or the Difference Between Two Means 


alternatives u = 48, и = 50, и = 52, and и = 56 is true. These 
probabilities are respectively 17, .05, .17, and .85. 

These probabilities suggest that whenever the true value of u 
differs greatly from the hypothetical value being tested (which in 
this case is 50) the probability of rejecting the hypothesis is great 
but when the true value of и is not very different from the value 
stated in the hypothesis tested, the probability of rejection is 
small. In Figure 7-5 the horizontal axis represents values of y. 
The hypothetical value being tested is presumed to be џ = 50. 

Probability of 


rejecting H 
<-=------= 1.0т--------> 


40 42 44 46 48 Н 52 54 56 58 60 
Scale of u 


Fia. 7-5. Power function of test of hypothesis и = 50 
when № = 25, с = 10, and a = .05. 


An ordinate drawn to a given point on the baseline represents 
for that value of и the probability of rejecting the hypothesis 
и = 50. The probability that a test of significance will lead to- 
rejection of a hypothesis is called the power of the test. The curve 
shown in Figure 7-5 is called a power function. As already stated, 
the power function of a test plays an important part in planning 
the conduct of a study and the analysis of results. 

Choice of Critical Region. The choice of one-sided or two- 
sided critical region has already been discussed in Chapter 3 in 
relation to proportions. A brief discussion of this problem in 
relation to means seems desirable. Three situations will be con- 
sidered. 

A. Н: ш = №. Under this hypothesis it is necessary to guard 
oneself against both the alternatives ш > m and ui < ш. The 
risk of being wrong can then be minimized best by using a two- 
sided critical region, for such a region leads to high probabilities of 
rejection of the hypothesis both when иу is much greater or much 
less than us. The critical region is then t < ty. and t > tito 


Number of Cases Needed - 163 


В. Н: ш < ш. This hypothesis means that one is satisfied 
if д is less than д» regardless what the difference may be. The 
alternatives against which it is necessary to guard oneself are 
ш > №. The critical region which leads most often to rejection 
of the hypothesis when it is false is > ta. 

С. Н: ш> ш. By analogous reasoning the critical region 
for this hypothesis is t < ta. 

The Number of Cases Needed to Reject a Hypothesis Con- 
cerning и When Some Alternative Is True. A problem similar in 
purpose to this one was discussed in Chapter 3 but there the 
hypothesis related to the population proportion. The decision 
as to how large a sample shall be taken is always a very funda- 
mental aspect of the plans for an investigation. If a difference 
actually exists between two population means but the samples 
taken are too small, the observed difference in the sample means 
may be nonsignificant. Then all the work of the study is wasted, 
whereas somewhat larger samples would have produced a positive 
conclusion. On the other hand, to take samples larger than neces- 
sary to establish the mean difference is a waste of time and money 
which could be better expended otherwise. When an observed 
difference is nonsignificant, the research worker is usually in a 
quandary as to how to interpret his findings. Is the real difference 
zero, or has he used too small a sample to establish a difference 
which actually exists? He can make a far stronger statement in 
this situation if he has determined in advance how many cases he 
would need to find a difference of a predetermined size with a 
specified risk of error. Having determined the value of N in ad- 
vance by a little algebra, if he carries out the study as planned and 
obtains a nonsignificant result, he can dismiss the argument that 
his sample was too small, and can state, with a specified degree 
of confidence, that the population difference is less than the pre- 
determined value. 

To make the decision as to size of N, practical experience 
must be relied on for answers to these questions: 

1. How large a difference (d) would it be of practical impor- 
tance to find if it exists in the population? For example, a clinical 
psychologist may say “Tf schizophrenics differ from normals by 
so-and-so much I want to reject the hypothesis that their means 
are equal; if the difference is less than that amount I do not care 


whether we find it or not.” е : 
2. What estimate can be made for the population variance? 


164 - The Mean or the Difference Between Two Means 


Unless the area of investigation is entirely new, it is often possible 
to get from previous studies a rough estimate of the variance, or 
perhaps an estimate that it lies between two specified values. 

3. How much risk can be taken of deciding that a difference 
exists when it really is zero? This decision is the choice of a. 

4. How much risk can be taken of deciding that a difference 
is zero when it really is as large as the predetermined value d? 
This is В, the risk of error of the second kind. 

When these four values (d, с?, a, and 8) have been chosen on 
nonstatistical grounds, the statistician can determine the neces- 
sary size of N by some easy algebra. 

Suppose we wish to test the hypothesis that u = 75 with a 
two-sided test at .02 significance level. We have some prior knowl- 
edge about variability and we hazard a guess that с = 8. We 
decide that if the actual value of и differs from 75 by 5 points ог 
more in either direction, we wish to detect the existence of a dif- 
ference d with probability .60, that is, we want В to be no greater 
than .40 for any alternative outside the range 70 to 80. How large 
a sample will be required? 

The specified values are these: 


ин = 75 = value under the hypothesis d = 80 – 75 = 5 


ш = 80 = one alternative value d = 70 – 75 = – 5 
из = 70 = second alternative value а = .02, В = .40, с = 8 
В 75 C 80 


The solution is analogous to that given on page 69 for a hy- 
pothesis about a proportion. In the adjacent sketch, the curve 
with mean ин = 75 represents the sampling distribution of X under 
the hypothesis, and the line segment BC is the region of acceptance 
for Н: u = 75. This region has probability .98. Then 


C=75 +2907 and В -75--zacy. 


If џ = 80 instead of 75, the curve on the right represents the actual 
sampling distribution of X and the area under this curve to the 


Number of Cases Needed - 165 


left of the ordinate at C is the probability of false acceptance of 
Н:и=75 when и = 80. Since we have said this probability is 
not to be larger than В = .40, the point C must be z4ey units 
from 80, or C = 80 + 200т. Аз В will always be chosen to be 
less than .50, z; will always be negative. If и is actually larger 
than 80, 8 will be even smaller than if = 80, and so the require- 
ments of the problem are fulfilled. 
Then С = 75 + гоу andalso C = 80 + 24007 


Непсе 75 + Zog = 80 + 2.400 
ог 80 — 75 = ох(2» — 24) = — ох(2и + 2.40) 


8 
= —— (2.326 + .253 
Un | ) 


and N = (8(2.326 + .253))? = 17.0 


If ра should be larger than 80, the curve representing the 
actual sampling distribution will be farther to the right so В will 
be less than .40 and the computed value of N will be less than 17. 
Taking N = 17 will then amply satisfy the requirements of the 


problem. 
The argument for иа = 70 is similar. The curve representing 


the actual sampling distribution is to the left and not shown in 
the diagram. 
Then В=75+20х and B=70+ zorg 
and therefore 75 + 200у = 70 + 2007 
75 – 70 = oxen — 20) 
5= VN (.253 + 2.326) 


and N = (8(2.579))° = 17.0 as before. 


If a one-sided test had been used, the only difference would 
have been in the position of C with respect to мн. Its position 
with respect to мл would be unchanged. Then С = 75 + 2.107, 


and М = (8(2.054 + .253))? = 14 
In general if d = pa — Ин is the departure from hypothesis 
which it is desired to detect, N may be calculated as follows: 


2 в? 
(7.30) For the two-sided test, N = (5 (24a + 2) =F (гы + Zp)? 


2 2 , 
(7.31) and for the one-sided test, N — G (2. F 2 = = (Za + 28)? 


166 - The Mean or the Difference Between Two Means 


Both formulas indicate that as the discrepancy becomes larger it 
can be detected with a smaller sample and as d becomes very 
small only a large sample can cause rejection of the hypothesis 
when it is false. 

Now suppose we do not know the value of с but we have two 
or more estimates available, of which ce; is the smallest and о. the 
largest. If we substitute ø, for с in the formula we shall obtain 
an estimate N; of the necessary sample size. If we substitute 
о» for в we shall obtain a second estimate №, larger than Ni. 
While it cannot be guaranteed that either sample size will exactly 
meet the specifications of the problem, these numbers provide 
estimates which are a vastly better guide to research planning 
than the blind guess all too often followed. 

Suppose there are two populations presumed to have the same 
variance c? and we wish to test the hypothesis ш — us = 0, with 
a and В chosen arbitrarily as before. How many cases shall be 
taken in each sample? If the cost of obtaining a single case is 
the same for each sample, the best procedure is to take samples 
of the same size, №, = N: = №, and to determine the number of 
cases in each by the formula. 


(7.32) N= 29. (Za + 23)? for the one-sided test. 


The formula for the two-sided test is similar with }a substituted 
for a. 

Suppose the two populations are presumed to have unequal 
variances and these are estimated to be o% and оъ. Again we 
wish to test the hypothesis шу = ws. If the cost of obtaining a 
single case is the same for each sample, the best procedure is to 
make sample size proportional to the standard deviations and 
to let 


(7.38) М, = ibn (Za + Zs)? 


(7.34) and М, = TET og, + 28)? 


for the one-sided test. In the corresponding formula for the two- 
sided test, z,, takes the place of za. 

By way of illustration, suppose the person planning a research 
study estimates e; = 5 and о» = 8 as a fair approximation for the 
standard deviations of the two populations, and chooses a = .01 


Samples from a Finite Population - 167 


and В = .20. He thinks it important to be able to detect a dif- 
ference between the means if it is 1.5 or larger. The statistician 
then estimates that the number of cases to be taken from the two 
populations should be 


_ 25 +40 | 

№ = “g5 (2.326 + .842)° = 290 
_ 40 + 64 5 Ж 

Na = as (2.826 + .842)° = 464 


and verifies that 282 = 8. 


If the person paying for the study feels that N = М, + М, = 754 
is larger than he can afford, he may revise his choice of о, of В, or 
of d. The ratio of aE will remain 5 in any case. 

2 

Verify the following results if 01 = 5, оз = 8 and a one-tailed 

test is used: 
If a = .05, В = .20 and d = 1.5, М, = 179, N: = 286 and М = 465 
Ifa=.01, В = .30 and а = 1.5, № = 235, № = 375 and М = 610 
I a= .01, 8-.20andd-2, М= 163, N: = 261 and N = 424 
Ifa=.05, B-.30andd-2, №= 76, М = 122 and М = 198 


Standard Error of X in Samples from a Finite Population. 


The formula ox = I is correct when the sample is small in com- 


parison with the population. This situation is usually true of 
experimental studies. In such studies the population which is 
sampled may not even be entirely in existence at the time of 
the experiment. Thus, the results of a study on children may be 
applied to children not yet born at the time of the study. In 
experimental studies the populations are usually regarded as 
infinitely large. 

Another type of study is one which applies to a well-defined 
population of finite size. The purpose of the investigation may be 
to determine the average score of students in a particular college 
on a scale measuring their attitude toward a current college policy. 
Suppose the college population consists of M students. If the 
scale is given to all students then the mean and variance of the 
scores can be ascertained exactly by the formulas. 


(7.35) paS 


168 - The Mean or the Difference Between Two Means 


M 
L (X; – п)? 
5 M 


If the purpose of the study is to measure only the attitude of 
the students actually enrolled in the college the mean in Formula 
(7.35) provides the answer without sampling error. The standard 
error of u is zero. There may, of course, be other errors, such as 
unreliability of the scale, but sampling error has been eliminated 
by using the entire population. 

It may be uneconomical or otherwise undesirable to apply the 
scale to all M students. A sample of N may, therefore, be selected 
at random. An average calculated from this sample is subject 
to sampling error and a standard error formula is required. It is 
to be expected that the formula will reflect not only the size of 
the sample but also this size in relation to the size of the popula- 
tion sampled. 

If the sample mean is written 


(7.36) e 


then 
(7.37) Еу 


where c is defined in Formula (7.36). Since ø is not known the 
sample estimate s is used and 


(7.88) ies DAMIEN, 


An examination of Formula (7.37) shows that ох = 0 when 


N = M, asis to be expected where the sample consists of the entire 
population. ИМ is small in comparison to М, the factor 7 =f 


is close to 1 and may be disregarded. This leads to the formula for 
the standard error of a mean from an infinite population. 
If M is large, M — 1 may be replaced by M, and the factor 


-N 5 FAM дм 
V War my be written with close approximation as V/ 1 — м’ 


where x is the fraction of the population included in the sample. 


Exercise - 169 


EXERCISE 7.5 

1, A mental test has been administered to 1000 students. After the 
test has been given to the entire group it is decided to estimate the mean 
score of the 1000 students from a sample of 500 papers. These yield 
X = 35 and s= 5. 

a. What is the estimate of the standard error of X? 

b. What would it be if the number of students in the sample were 100? 

с. 900? 

Assume that X and s are unchanged as sample size changes. 

2. Given s = 5 as above, what would the standard error be for samples 
of (a) 500, (b) 100, and (c) 900, if the samples were considered to be drawn 
from an infinite population? By what proportion is the standard error 
reduced if finiteness of the population is taken into account? 

3. If s is 5 and the population is 1000, what must be the sample size 
to make the standard error 4 of its value calculated on the basis of an 
infinite population? 

4. Given М! = 6, X; = 29.5, s? = 24 

N: = 5, Хз = 25.1, s = 64 
Compute ¢ to test H : ш № = 0, with two-sided test and a = .01 

a. Using Formulas (7.21) and (7.22) with n =6+5-2 

b. Using Formula (7.25) with n = 6 + 5 — 2 

c. Using Formula (7.25) with n as given by (7.26) 

Why do the results not agree? Which answer do you consider to be best? 

5. As an algebraic exercise, show that if М, = Ns, the routine of For- 
mula (7.24) will give exactly the same result as Formulas (7.21) and (7.22), 

6. As ап algebraic exercise, show that if 52, = 8, the routine of № ormula 
(7.24) will give exactly the same result as Formulas (7.21) and (7.22). 

7. In a sample of 12 cases, X = 31.8 and s = 4.8. 

a. Test H : н = 35 with a = .05 using a two-sided test. 

b. Test H : p = 35 with а = .05 using a one-sided test with right tail 

critical. 

c. Test H : = 35 with а = .05 using a one-sided test with left tail 

critical. 

d. Compute the confidence interval for р with confidence coefficient .95. 

е. Compute the confidence interval for и with confidence coefficient .99. 

8. For each of the following situations a test is to be made of the 
hypothesis that two population means are equal. Give the number of the 
formula by which you would compute the standard error for variance of 
the difference in sample means. After you divide the difference in sample 
means by its standard error, would you consult the normal probability 
table or a table of "Student's" distribution? If the latter, with what 
number of degrees of freedom would you enter the table? 

a. 300 pupils in the sixth grade are given a test in speed of reading in 

September and again in May in order to compare their mean gain 


170 - The Mean or the Difference Between Two Means 


with the mean gain for a similar period published as a norm by the 
author of the test. 

b. From each of 6 litters of white rats 2 rats of the same sex and 
approximately equal weight are selected. One of these is fed diet A 
and the other diet B. At the end of the experimental period it is 
desired to test the hypothesis that the diets produce equal gains 
in weight. 

c. For two independent samples the following data are obtained: 

N, = 325, X, = 523, s? = 24.2, Ne = 111, Xo = 54.7, в? = 23.6. 
A test of H : оу? = сз? has been made and the hypothesis accepted. 

d. For two independent samples the following data are obtained: 

Ni = 14, X, = 26.3, s! = 185, М, = 19, Y» = 25.4, з. = 18.8. 
A test of H : с> = т? has been made and the hypothesis accepted. 

. For two independent samples the following data are obtained: 

№, = 390, X, = 63.2, s? = 10.3, N: = 406, Х, = 63.2, and s? = 38.3 
A test of H :o;? = 0? is made and hypothesis rejected. 
f. For two independent samples, the following data are obtained: 
М, =5, X = 19.2, s? = 42, №, = 9, X, = 243, 8 = 36. 
A test of the hypothesis gı? = оу? has been made and the hypothesis 
rejected. 
9, A research worker is considering the following plans for a two- 
sided test of H : и = 70: 


(1) N = 40, “= 01 (4) N = 200, «= 05 
(2) N= 100, a = .01 (5) N = 500, «= .05 
(3) N = 200, а = .02 (6) N = 500, «= .06 


a. Which one will give him the most powerful test? 

b. Which will give him the least powerful test? 

c. Which will incur the least risk of rejecting the hypothesis if it is 
true? 

d. If he accepts the hypothesis, which will allow him to make the 
strongest statement? 

10. For each of the following requirements, ascertain how many cases 

should be included in the sample or samples. Use a one-sided test. 


1. Hypothesis: ин = 25 4. Hypothesis: ш — m = 3 
Alternative: иа = 23 Alternative: m = m = 6 
с = 12, a = 01,8 = .30 91 = 5,05 = 5, а = .02, В = .20 
2. Hypothesis: m — ш = 0 5. Hypothesis: ui — m = 10 
Alternative: m — m = 3 Alternative: ш — ш = 5 


т, = 15,02 = 15, a = .02, 6 = .20 тз = 10,0; = 20, a = .05, В = .40 
3. Hypothesis: ui — ш = 0 

Alternative: ш — ш = 2 

тү = 12, оз = 20, а = .05, В = .40 


Design of Samples in Surveys · 171 


Design of Samples in Surveys. One of the most crucial aspects 
of a sample survey is the decision as to the number of cases and 
the methods by which they shall be selected. The answer to this 
question provides the design of the survey. This and other prob- 
blems related to surveys are discussed by Deming * and Yates.” 

Up to this point, the discussion of sampling error has assumed 
that each element in the sample is drawn at random from the 
entire population. This method of sampling is called unrestricted 
random sampling or sometimes simple sampling. In some situa- 
tions, unrestricted random sampling is impossible. In other 
situations an alternative method may be more advantageous. 

The first requirement of any sampling procedure is the avoid- 
ance of bias. The error which causes a statistic to differ from its 
parameter may arise partly from random errors in the selection of 
individuals and partly from bias in their selection. The total of 
the random errors decreases as sample size increases but the total 
of errors due to bias does not so decrease. It forms à constant 
error which on the average is about the same for a large as for 
a small sample. 

The next requirement of a "good" sampling procedure is that 
it provide as much information about the population as can be 
obtained at a stipulated cost or that it provide a stipulated amount 
of information for as low a cost as possible. This requirement 
raises problems concerning the size of sample, the method of se- 
lecting cases, and the allocation of cases. Adequate discussion 
of these problems would require a large monograph and is out- 
side the scope of this book. Here we shall make brief mention 
of several alternatives to unrestricted random sampling, and shall 
give the standard error of the mean and related formulas for two 
of them. 

The advantages of a particular method can be more easily 
discussed if we first consider the meaning of the terms frame and 
sampling unit. The following quotation is from pages 20 and 21 
of Yates.” 


“АП rigorous sampling demands a subdivision of the material to be 
sampled into units, termed sampling units, which form the basis of the 
actual sampling procedure. These units may be natural units of the 
material, such as individuals in a human population, or natural aggregates 
of such units, such as households, or they may be artificial units, such as 
rectangular areas on a map, bearing no relation to the natural subdivisions 


of the material. 


172 - The Mean or the Difference Between Two Means 


“Tt is not always necessary to make an actual subdivision of the whole 
of the material before selection of the sample, provided the selected units 
can be clearly and unambiguously defined. Thus, with sampling units 
which are rectangular areas on a map there is no need to demarcate all 
these areas; they can be defined by co-ordinates, and the selected areas 
demarcated after selection. 

“Clear and unambiguous definition demands the existence or con- 
struction of some form of frame. In the sampling of a human population, 
for instance, with households as sampling units, there must be available 
a list of all households, and this list must be such that any household 
selected from it can be unambiguously located. In area sampling from 
maps, the maps must be such that the selected areas can be unambiguously 
defined on the ground. ... 

“Sampling units may be of the same or differing size. They may contain 
the same, or approximately the same, number of natural units, or they may 
contain widely differing numbers. The whole procedure of sampling, in- 
cluding the estimation of the population values and the sampling errors, is 
simplest when the sampling units are of approximately the same size and 
contain approximately the same number of natural units. Often, how- 
ever, the material is such that this condition cannot be conveniently ful- 
filled. In particular, if the natural units are themselves of widely differing 
size, variation in size of the sampling units or in the number of natural units 
they contain is inevitable.” 


Among the more common alternatives to unrestricted random 
sampling are these: 

A. Conscious selection of “typical” cases. This unsatisfactory 
method is in widespread use. The literature is full of instances 
showing bias of one kind or another in such samples. For example, 
Yates," (р. 12) spread out some 1200 stones on a table and asked 
12 observers to select three samples of 20 stones each which should 
represent as nearly as possible the size distribution of the whole 
collection. Of the 36 samples, 30 had means larger than the mean 
of the collection. 

B. Systematic selection from a list. A very frequent method 
of drawing a sample is to take every 10th or every 20th, or in 
general every rth name, on a list. If the first name is chosen 
at random from among the first r names, every name has the same 
opportunity of being included in the sample and that is important. 
However not all samples are equally likely, in fact some samples 
would have probability zero. Suppose the list is alphabetical and 
includes several members of the same family with the same sur- 
name. If one of these is selected the others almost certainly will 


Stratified Sampling - 173 


not be, and so observations are not independent. This method 
is often far less expensive than unrestricted random sampling and 
it is usually free of bias, unless there are periodic features in the 
list which coincide with the sampling interval. It is usually not 
possible to get a proper estimate of the standard error of a statistic 
from such a sample; the ordinary formulas applicable to unre- 
stricted random sampling may be presumed to overestimate 
sampling variability. 

C. Sampling after stratification of the population. The popula- 
tion is subdivided into a number of groups or “strata” and a 
random sample selected from each stratum. It is most efficient 
when the strata are so defined that there is a high degree of uni- 
formity among the units within each stratum and considerable 
diversity from one stratum to another. It has two purposes: 
(1) to inerease the accuracy of the population estimates and (2) to 
assure adequate sampling from any subgroup which is of par- 
ticular interest. The stratification affects the variability of any 
computed statistics, as indicated on pages 174 and 175. 

D. Cluster sampling. Here the sampling units are themselves 
groups or clusters of natural units. Thus the sampling units might 
be households composed of individual persons; or city blocks 
composed of dwelling units; or townships composed of farms; or 
classes composed of students; or groups of telephone poles erected 
near each other, and so оп. A random sample is taken of all the 
clusters composing the population and observations are made on 
all the individuals in the selected clusters. Cluster sampling is 
discussed further on pages 175-177. 

E. Multistage sampling. This term connotes a more complex 
plan with greater flexibility of pattern than the preceding. The 
sampling is carried out in successive stages, and any of the pre- 
ceding plans may be applied at any of the stages. For instance 
in the first stage a sample of clusters might be chosen at random 
from the entire population and then in the second stage a sample 
of individuals be chosen at random from the selected clusters. 
There are a great many possible variants of multistage sampling 
and each has its own set of formulas for sampling variability. 

Stratified Sampling.* This procedure is possible only when 
there is some previous information about the population. It is 
efficient only if the strata or subpopulations can be selected in such 


* For a minimum course this and the succeeding sections of this chapter may be 
omitted without impairing the clarity of subsequent chapters. 


174 - The Mean or the Difference Between Two Means 


a way that the variance of each stratum is smaller than the vari- 
ance of the entire unstratified population. 

Stratification of a community is often on a geographical basis, 
as when the residents of a city are classified by census tract; or 
the residents of a state are classified according to whether they 
live in open country, in villages, or in cities of various specified 
size. Stratification may be by some easily identified characteristic 
of individuals which is presumed to be related to the purpose of 
the inquiry. 

Suppose that a finite population has been divided into strata 
and that 


k = the number of strata 
М; = the number of individuals in the ЯВ stratum 


k 

М = Y M; = the number of individuals in the population 
i=1 

N; = the number of individuals in the sample taken from the 
ith stratum 


k 
N = У М; = the number of individuals in the sample 
i=l 


Xi- I. = mean of the observations on the N; individuals 
in the sample from the ith stratum 

8? = 2o Xa — variance of the observations on the N; in- 
dividuals in the sample from the ith stratum 


Then и, the mean of the population, is estimated by the formula 


(7.39) Х, = EL 
and и = E(X) 
The variance of X, is estimated by the formula 
MN M;-N; s? 
4 US ad AS н 
(7.40) js, 5(м) Mi-1 NM 


Quite often the information sought is not an estimate of the 
average и but of the total Ми. That estimate is 


(7.41) T, = MX, = ZM.X, 


and My is the expected value of T,. The variance of T, is esti- 
mated as 


Cluster Sampling : 175 
2 = назы ыйы ЫА 
(7.42) 8°т, УМ: + М, 1 N: 


Sometimes the information sought is an estimate of the propor- 
tion P of individuals in the population possessing some charac- 
teristic, say the proportion of homes in a rural area that have 
central heating, or the proportion of voters who will express a 
preference for a given presidential candidate. The estimate of P 
for the entire population is then 

ZM ipi 
(7.43) De ae 


and the variance of р is 
LM. MEN: pai 
(7.44) e (17) cm 


For large samples the various estimates described above are 
approximately normally distributed about their respective param- 
eters as mean, with the square root of the indicated variance as 
standard deviation. Therefore, confidence intervals for the param- 
eters can be obtained by use of tables of the normal distribution. 

The problem of how large a sample to take and how to appor- 
tion the cases among the strata depends upon such considerations 
as the amount of money which can be spent on the collection of 
data, the cost of locating an individual, the cost of making an 
observation on an individual, the degree of precision required in 
the estimate and the relative variability of the different strata. 
Formulas for the optimum allocation of cases to strata will not be 
presented here because the discussion necessary to safeguard their 
use requires more space than the scope of this book justifies. A 
clear presentation of these is given by David.’ The reader who 
contemplates making a survey by sampling should also read 
Cochran, Deming, Hansen, King, Marks, Neyman, Stephan, and 
Yates. (See references at the end of this chapter.) 

Cluster Sampling. For this method the population is divided 
into many relatively small groups or clusters of individuals and 
the sample consists of a number of these clusters chosen at random. 

In the formulas given here it will be assumed that all the in- 
dividuals are observed in those clusters which are chosen, but this 
need not be so. It often happens that a number of clusters is 
selected at random and then out of those selected clusters, a pre- 
determined number of individuals is chosen at random. 


176 The Mean or the Difference Between Two Means 


Cluster sampling is economical when the cost of measuring 
an individual is relatively small and the cost of reaching him 
relatively large. For example, suppose a railroad is making a 
survey of the ties on which rails rest in order to assess the need 
for replacements. To take an unrestricted random sample of all 
the ties in use, would entail heavy costs to travel to the place where 
the selected ties might be found but very little cost to examine 
the ties once they are reached. It might cost less and yield more 
information to measure 100 clusters of 5 ties each than to measure 
200 ties chosen individually at random. It would produce more 
information to measure 100 clusters of 5 ties each than to measure 
5 clusters of 100 ties each. The relation of costs to sampling 
procedure is outside the scope of this text, but it is well worth the 
careful investigation of any one contemplating a large inquiry 
by sample. 

Cluster sampling is most advantageous when there is great 
heterogeneity within clusters. If all the individuals in each cluster 
were exactly alike there would be no advantage in observing more 
than one of them. The larger the variation within clusters, the 
smaller the number of individuals which will be required to hold 
зу to a predetermined size. 

The formulas for cluster sampling rather than for simple sam- 
pling are usually needed in studies that utilize area sampling, as 
when small areas like city blocks or rural townships are chosen at 
random and every dwelling in those areas, or every farm, or every 
individual under consideration, is included in the sample. These 
formulas would be needed if a population of all the eighth grade 
pupils in New York City were to be sampled by selecting at ran- 
dom a number of eighth grade classes and making observations 
on all the pupils in each class. 


Let М = the number of clusters in the population 
m = the number of clusters in the sample 
^; = the number of individuals in the ith cluster 


Xi- Е: = mean of the АВ cluster 


i 


у": 


i=1 


= average number of individuals per cluster 


й = 


N = Zn; = mī = number of individuals in the sample 


References · 177 


Then the sample estimate of the population mean д is 


Lk уу 
(7.45) ae 


and the variance of X, is 


$ nj Z: = X) 


М- т; 
7.4 2; = = 
(140) $3. = Mm — (m-1 
In the special case for which т = m = +++ = Nm =M, Formulas 
(7.45) and (7.46) can be simplified. Then the estimate of ш is 
хх; 
(7.47) XY, = 
т 


and Из variance is 
Жы eck ILL И 
(7.48) Sx, Ma m 


In dealing with proportions in samples obtained by cluster 
sampling it is only necessary to substitute p; for X; and P for xe 


in Formulas (7.45) to (7.48). 
The use of cluster sampling in relation to mental testing is 
discussed by Marks !? who gives a derivation of the formulas for 


cluster sampling used above. 


REFERENCES 

1. Cochran, W. G., “Modern Methods in the Sampling of Human Populations,” 
American Journal of Public Health, 41 (1951), 647-653. 

2. David, Е. N., Probability Theory for Statistical Methods, Cambridge, 1949, 
Cambridge University Press. Chapter XIV, “Sampling Human Populations.” 

3. Deming, W. Edwards, Some Theory of Sampling, New York, 1950, John Wiley 
and Sons. 

4, Deming, W. Edwards, “A Brief Statement on the Uses of Sampling in Censuses 
of Population, Agriculture, Public Health, and Commerce,” United Nations 
Publications, 1948, XVII, 1. 

5. Dixon, W. J. and Massey, F. J., Introduction to Statistical Analysis, New York, 
1951, McGraw-Hill Book Company. 

6. Hansen, М. H. and Hauser, P. M., *Area Sampling — Some Principles of 
Sample Design," Public Opinion Quarterly, (Summer 1945), 183-193. 

7. Hansen, M. Н. and Hurwitz, W. N., “Relative Efficiencies of Various Sampling 
Units in Population Inquiries,” J ournal of the American Statistical Association, 


37 (1942,) 89-94. 


178 - The Mean or the Difference Between Two Means 


8. 
9. 


10. 
Tis 
12. 
13. 
14. 
15. 
16. 
Ln 
18. 
19. 
20. 
21. 


22. 
23. 


Hansen, M. H. and Hurwitz, W. N., “On the Theory of Sampling from Finite 
Populations," The Annals of Mathematical Statistics, 14 (1943), 333-362. 
Hansen, M. H. and Hurwitz, W. N., “The Problems of Non-Response in 
Sample Surveys,” Journal of the American Statistical Association, 41 (1946), 
233-236. 

King A. J. and Jessen, R. J., “The Master Sample of Agriculture,” Journal 
of the American Statistical Association, 40 (1945), 38-56. 

Long, John A., Motor Abilities of Deaf Children, New York, 1931, Bureau 
of Publications, Teachers College, Columbia University. 

Marks, Eli S., “Selective Sampling in Psychological Research," Psychological 
Bulletin, 44 (1947), 267-275. 

MeNemar, Quinn, Psychological Statistics, New York, 1949, John Wiley and 
Sons. 

Neyman, Jerzy, “Contribution to the Theory of Sampling Human Popula- 
tions,” Journal of the American Statistical Association, 33 (1938), 101-116. 
Neyman, Jerzy, “Оп Two Different Aspects of the Representative Method,” 
Journal of the Royal Statistical Society, 97 (1934), 558-606. 

Stephan, Frederick F., “Practical Problems of Sampling Procedure," American 
Sociological Review 1 (1936), 569-580. 

Stephan, Frederick Е., “Stratification in Representative Sampling,” Journal 
of Marketing, 6 (1941), 38-47. 

Stephan, Frederick F., “History of the Uses of Modern Sampling Procedures,” 
Journal of the American Statistical Association, 43 (1948), 12-39. 

Terman, Lewis M. and Miles, Catharine C., Sex and Personality, New York, 
1936, McGraw-Hill Book Company. 

Welch, B. L., “The Generalization of ‘Student’s’ Problem when Several 
Different Population Variances are Involved,” Biometrika, 34 (1947), 28-35. 
Westover, F. L., Controlled Eje Movements versus Practice Exercises in Reading, 
New York, 1946, Bureau of Publications, Teachers College, Columbia Uni- 
versity. 

Wrightstone, J. W., Appraisal of Newer Elementary School Practices, New York, 
1938, Bureau of Publications, Teachers College, Columbia University. 

Yates, Frank, Sampling Methods for Censuses and Surveys, London, 1949, Charles 
Griffin and Со. Ltd. 


. Yates, Frank, “A Review of Recent Statistical Developments in Sampling 


and Sampling Surveys,” Journal of the Royal Statistical Society, 109 (1946), 
12-43. 


Ө Inferences Concerning Variances 
and Standard Deviations of 


Normal Populations 


Not only do research workers ask questions about means of 
populations but they ask questions about their variabilities also. 
What estimate can be made of the unknown variance of a popula- 
tion from the observed variance of a sample? Does a period of 
study make a class more variable at the end than at the beginning? 
Less variable? Are boys and girls equally variable in some par- 
ticular trait? If one sample is drawn from each of several popula- 
tions, are the observed variances of these samples consistent with 
the hypothesis that the population variances are all equal? 

Such questions are similar to the questions asked concerning 
means in Chapter 7 and similar logic is used in answering them. 
However the appropriate statistics are different and will have 
different sampling distributions. The new issues to be discussed 
in this chapter therefore will relate to the selection of a statistic, 
to the identification of the form of the sampling distribution of 
that statistic, and to the reading of the appropriate probability 
tables. The general logic of making interval estimates or testing 
hypotheses on the evidence of such statistics will follow the pat- 
terns already established. 

Sampling Distribution of the Statistic (N — 1)s?/c?. In the 
sampling experiment carried out in Chapter 6, each student com- 
puted four values of the statistie Za?/o? or (N — 1)s?/g?. (See 
line 9 of Worksheet IL) In Table 6.1, on page 134, and Figure 
6-2, on page 134, the empirical distribution of 125 such computed 
values is shown. It can be proved mathematically that the sta- 


tistic 22 = Wee has a x? distribution with N — 1 degrees of 
[^ 


freedom. 
If a large number of students draw samples of 5 cases from the 


data of Table XXIV and compute the variance for each sample, 


180 - Variances and Standard Deviations 


what proportion of these samples would yield a variance of 1500 

or more? Since c? is known to be 1380.4 for the population, the 
relevant statistic is 

2 _ 4(1500) 

1380.4 

From Table VIII it may be seen that 4.35 < x°.. The expected 

proportion of samples with s? > 1500 is therefore greater than .25. 

Suppose a student reports that he obtained a variance of 

8% = 423 but his computations are not available for checking, 

should it be assumed that he made a mistake of some sort? His 

variance is only about a third as large as the population variance. 

Is the observed variance small enough to arouse suspicions con- 

cerning the method of sampling or the correctness of the com- 

putations? 


= 4.35 


= 4(423) _ 12 
For 4 degrees of freedom, Table VIII shows x21 = 1.1. Since 
10% of random samples of 5 cases from a population in which 
о? = 1380.4 would produce x? smaller than 1.1, more than 10% of 
such samples would show x? smaller than the observed value 1.2. 
These same samples would of course have s? smaller than 428, 
the observed value from which x? = 1.2 was obtained. Now if 
more than 10% of samples would by chance produce a variance 
smaller than the one observed, that variance cannot be considered 
small enough to excite suspicion as to its correctness. Obviously, 
however, this absence of suspicion does not demonstrate the cor- 
rectness of the sampling and computational procedures used. 

Interval Estimate for the Variance. The argument here is 

similar in principle to the argument concerning the interval esti- 
mate for the mean developed in Chapter 7. There is probability 
.05 that (N — 1)s?/o? > x?s; and probability .05 that 

(N – 1) 82/0? < x*u 
Therefore there is probability .90 that (№ — 1)s?/o? will be neither 


larger than x?s nor smaller than x?«. This relation may be 
expressed in general as r 


(8.1) Pa, МИ. Xie} 1-а 


What is wanted now is an inequality expressing limits for о. This 
inequality can be obtained from (8.1) in two steps, or the reader 


Interval Estimate for the Variance - 181 


not interested in the algebraic derivation may merely accept 
Formula (8.2). The first step is to take the reciprocal of each 
term in (8.1) and reverse the direction of the inequality signs. (By 
way of illustration one may note that 4< 8< 10 and 4 > $ > To 
ог <<. In general i< 1 if a and b are positive and а < b.) 
This step produces the inequality 
1 а? 1 
Хе 4 (N - Ds ? ХА 
Each member of this inequality is now multiplied by (N — 1)s*. 
Since (N — 1)s? is a positive number, the direction of the signs is 
not affected. The result is 
Ws Ds, pi (N E: Ds 
X1-}e x 
This inequality has probability 1 — а or is made with confidence 
coefficient 1 — а. In general 
(8.2) {йш ыша D а 
Ха X ja 
Table VIII provides the values of x? needed for obtaining interval 
estimates made with confidence coefficients of .99, .98, .96, .95, 
.90, .80, and .50. Ina practical problem these values are likely 
to meet all requirements. 

Certain of these values of x? for n = 4 are shown in Table 8.1 
with the appropriate computations for each of the four samples 
of Worksheet I. Thus for Sample 1, row 7 of Worksheet I in- 
dicates that (N — 1)s? = 7635.2. This number has been multi- 
plied in turn by each of the reciprocals shown in the third column 
of Table 8.1. 

From examination of Table 8.1 and Figure 8-1, interpretations 
can be made similar to the first four and the sixth interpretation 
regarding the confidence interval for the mean. However а change 
must be made in the fifth statement. Whereas the sample mean 
lies exactly in the middle of the confidence interval for м, the 
sample variance lies much nearer to the lower than to the upper 
end of the confidence interval for 0°. 

1f the confidence coefficient were zero the interval would nar- 
row to a point and that point would be at с? = s. Ifa confidence 
coefficient of 1.00 were demanded, implying certainty that c* is 
in the interval, the interval would have to be infinitely wide. 


182 - Variances and Standard Deviations 


TABLE 8.1 Computation of the Statistic (N-1)s?/x? for Each of the Four Samples 
in Worksheet | at Selected Values of Chi-Square. 
n=4, о? = 1380.4 


Reciprocal of Value of (N —1)s?/x? for Sample 
T: 
| m. B ere tabulated 1 2 3 4 
value (7635.2)* (14069.2)* (3246.8)* (10696)* 
X3 13.277 0753 575 1059 244 805 
x?.95 9.488 1054 805 1483 342 1127 
x?.90 7.779 .1286 982 1810 417 1376 
X? зо 5.989 .1670 1275 2350 542 1786 
хп 4.878 .2050 1565 2884 665 2193 
X?.30 2.195 4556 3479 6410 1479 4873 
x? 20 1.649 6064 4630 8532 1969 6486 
x? 10 1.064 .9398 7176 13223 3052 10053 
Xos 711 1.4065 10789 19788 4567 15044 
Хи .297 3.3670 25708 47371 10932 36013 


* The observed value of (N — 1)s* for the sample. 


“s 9000 


Estimation of 


g?=1380.4 


Fic. 8-1. Interval estimates of о? made with confidence coefficients 
of .40, .60, .80, and .90 from each of four samples of 5 cases and from 
composite sample of 20 cases. (Samples as given on Worksheet I.) 


Dotted line is for composite sample. Horizontal tic is s? of sample. 


The Chi-square Distribution - 183 


EXERCISE 8.1 
In Table 8.1 
1. Verify those values of x? which are available in Table VIII or in 
any other table you may have at hand. V. alues in Table 8.1 are given 
with more significant digits than can be read from Table VIII. 
2. Verify some of the reciprocals in Column 3. 
3. Verify some of the values in the last four columns. 
4. Verify the intervals for Sample 1: 
C(575 < a? < 25708) = .98 
C(805 < a? < 10739) = .90 
C(982 < o? < 7176) = .80 
C(1275 < а? < 4630) = .60 
C(1565 < о? < 3479) = .40 
6. Verify the intervals shown in Figure 8-1 from the entries in Table 8.1. 
6. Compute the intervals for the composite sample and verify them 
by comparison with the dotted lines in Figure 8-1. The following values 
of x? for n = 19 will be needed: 


х?.% = 30.144 х.з = 15.352 
xl. = 27.204 xia) = 13.716 
x? во = 23.900 хёло = 11.651 
Хх? по = 21.689 х?.% = 10.117 


7. For samples 2, 3, and 4, obtain confidence intervals similar to those 
given in question 4 for sample 1. 

8. What proportion of all samples is expected to yield interval esti- 
mates that actually contain c? when 1 — а = 90? Of the 4 interval 
estimates made from the 4 observed samples, how many did contain 0°? 
Answer the same questions for confidence coefficients of .80, .60, and .40. 

9. If the standard deviation of a population is 15.4, what is the prob- 
ability that a sample of 18 cases will have a standard deviation at least as 
large as 20? (Hint: Square the standard deviation and work with the 
corresponding value of the variance.) 

10. If a sample of 24 cases has a standard deviation of 5.6, what 
interval limits can be set for the population standard deviation with 


confidence of .90? 


The Chi-square Distribution When the Number of Degrees 
of Freedom is Large. The largest value of n found in Table VIII 
is 30.* How are tests about the variance to be made if n > 30? 
Figure 4-5 has already suggested that as л increases the distri- 
bution of x? becomes more symmetrical. The distribution of Vx? 
or x approaches symmetry much more rapidly than does the dis- 

+ The volume of statistical tables soon to be published by the Biometrika office under 
the editorship of Е. S. Pearson and Н. О. Hartley will contain a x? table extending to 
n = 40, 50, 60, 70, 80, 90, and 100. 


184 - Variances and Standard Deviations 


tribution of x?. In fact when п > 30, the distribution of x is 
approximately normal with mean vn —.5 and standard error 
v.5. Then 
х-ы x-wVn-.B5 xv2—v2n—1 

[A V5 1 
has approximately a unit normal distribution for large n. Chang- 
ing xV2 into v/2x?, this relationship becomes 
(8.3) z= V2x?—-V2n-1 


Formula (8.3) can be used to make tests about с, and consequently 
also to make tests about o°, from a large sample. 
Large Sample Estimate for the Standard Deviation. Formula 


De so that 


(8.3) can be used for this purpose, substituting x? = 
V2? = 2 V2N - 3, and n= N — 1 so that V2n — 1 = V2N - 3. 
Then 

(8.4) P[s.« “VIN -2- VIN -3< а) =1-а 


By algebraic manipulation of the expression within parentheses 
this statement may, without changing the probability, be rewritten 
as 


(8.5) Р{ sV2N -2 sV2N — 2 yaa 


————=———<д<-—— 
VIN В+ ны  VIN-3 нар 


The inequality within the parentheses is а confidence interval 
for c. 


If N is large enough that 20-3 may be considered as 1 and 


z is small in relation to V2N — 3, a simpler inequality giving ap- 
proximately equal results can be used: 


(8.6) P(t клат) с < (1+ tU 5))- 1-а 


This is precisely the result which would be arrived at on the as- 
sumption that s is normally distributed about с with standard 
error a. a statement which is sometimes made without 


the necessary qualification that it holds only for large samples. 


Tables of the F Distribution - 185 


Example. Given М = 308, s= 2.15, to find an interval es- 
timate for о with confidence coefficient .95. Substituting in the 
inequality of (8.6) produces the confidence interval 


1.96 1.96 
2.15 ( Li 16.) < < 2.15 (1 1.96 
ve). + Vei 


or 1.98 < e < 2.32 


Ratio of the Variances of Two Independent Samples from 
Populations with the Same Variance. In Chapter 7 on more than 
one occasion a statement was made that the observed variances 
of two independent samples were not inconsistent with the hy- 
pothesis that the samples came from populations with the same 
variances, 01? = o? = 0°. 

The statistic which is appropriate for testing this hypothesis 
is the ratio of the two observed variances. In general, if each of 
two independent statistics has a x? distribution and if each is 
divided by its appropriate degrees of freedom, the ratio of 12/1 
to x:/n: has a distribution called the F distribution * or the 
variance-ratio distribution. As has already been seen, (N — 1)s?/c? 
has a x? distribution. 


Я — 1) 8.2 ms? 
Nowif х? = (Dee - d then x2/m = 812/012 
1 1 
Р — 1)82 ms 
and if хә = Qu De - x then х/т = 827/02? 
2 2 
Then if oe = с? = 0° 
LUN Econ 
™m/ 72 s»? 


has an F distribution. 

Tables of the F Distribution. This distribution is really a 
family of distributions. The shape changes with m and also with 
m» In Table X values of n, or the degrees of freedom associated 
with the numerator of the variance ratio, are given in the hori- 
zontal row across the top of the table. Values of mz, or the de- 
grees of freedom associated with the denominator of the ratio, 
are given in the vertical column at the left. Each cell of the table 
represents а particular combination of т and ns. For that parti- 
cular combination there exists a unique probability distribution. 


* Named in honor of R. A. Fisher to whom the distribution is due. 


186 - Variances and Standard Deviations 


In this table only two points on that entire sampling distribution 
are shown. These are Р. and F.o such that 


P(s?/s? > Ру} = .05 
Pis?/s? > Е} = .01 


In the cell for nı = 4, nz = 4, we read Fos = 6.39 and Fẹ = 15.98. 
The position of Р. = 6.39 is marked on Figure 6-7 on page 141, 
but Fs is too far to the right to be shown. Now if 82/82 = 6.39, 
then 82/82 = iub = .16. 

The tables do not show Ё and Fo because these can be 
readily obtained from the tabulated values with л» and n; reversed. 
Values at the lower end of the scale are crowded so close together 
that they would have to be written with a good many digits to 


afford adequate precision. It сап be shown that F, = к= when 
1-а 


Е. is based оп n; and n; degrees of freedom and 7,., is based on 
^; and n; degrees of freedom. Therefore for Figure 6-7, F.o = 
1/6.39 = .16 and Fo = 1/15.98 = .06. 

Values of Fso F4, and Fo as well as F зу and F » are tabulated 
in Fisher and Yates, Statistical Tables. А still more extensive 
table computed by Merrington and Thompson will be included 
in the volume of tables soon to be published by the Biometrika 
office.’ 

When the variances of two samples are being compared to test 
the hypothesis that 01? = o} = 0°, it is just as exceptional for s;2/s;? 
to be less than F.o аз it is for s,2/s2 to be larger than Ёз. The 
proper procedure then is to divide the larger variance by the 
smaller. If the ratio exceeds F's, the hypothesis of equal variance 
may be discarded at the .10 level; if the ratio exceeds F a it is 
discarded at the .02 level. 

In the following chapter, a different manner of reading the 
tables will be considered, for use in a somewhat different kind of 
problem requiring a one-sided hypothesis. 

From the data on page 156 the variance of scores of 12 Barnard 
students is found as s? = 258.09 and the variance of 11 general 
studies students as s? = 411.30. To test the hypothesis 


с? = 03° = 02, 


we have F = 411.30/258.09 = 1.59. In the cell of the F table for 
which n; = 10 and m% = 11, we find that Fs = 2.86. The observed 


Exercise - 187 


F< Ёз. Therefore, if samples of this size were drawn repeatedly 
from populations with equal variance, and the variance ratio com- 
puted for every possible pair of such samples, over 10% of such 
ratios would be more exceptional than the observed ratio. The 
observations cast no reasonable doubt on the hypothesis of equal 
variance. 

Now let us consider the variances of the four samples of 5 cases 
listed in Worksheet I on page 128. These provide 12 variance 
ratios, namely 


2/82 = 1908.8/3517.3 = .54 s/s? = 811.7/1908.8 = .426 
82/8: = 1908.8/811.7 = 2.35 s/s2 = 811.7/3517.3 = .23 
82/822 = 1908.8/2674 = .71 s/s = 811.7/2674 = .30 
52/812 = 3517.3/1908.8 = 1.84 s/s? = 2674/1908.8 = 1.40 
82/8: = 3517.3/811.7 = 4.38 s/s = 2674/3517.3 = .76 
s/s = 3517.3/2674 = 1.32 s/s = 2674/811.7 = 3.30 


The sampling distribution of the variance ratio for m = 4 and 
п, = 4, which are the degrees of freedom associated with these 
variance ratios, is presented in Figure 6-7. The reader should 
plot the value of each of the 12 ratios shown above on the base 
line of Figure 6-7. The largest ratio, with value 4.33 is smaller 
than the indicated value of Ps. The value of F.o is .16 (not 
marked on the graph) and the smallest of the ratios, with value 
.23, is larger than Р.о. 

If two independent samples are drawn from populations in 
which с? = оз? = 0° 


then P {Fos < s?/s? < Figg} = .90 
721] s2/s7? > Fos} = .05 
Р { 32/8? < Fo} = .05 
ЕХЕВСІЅЕ 8.2 


Reading Tables of the F distribution 

1. Verify the following probability statements: 

a, Suppose a great many samples of 10 cases each are drawn from one 
population and a great many samples of 6 cases each from another, and 
suppose the two populations have the same variance. Now let each sample 
of 10 cases be paired with a sample of 6 cases in some random fashion 
and let the ratio of their variances be computed. 

Then P {81/5 > 4.78 } =.05 See cell for which т = 9, m = 5 
Р {s2/si? > 3.48 } = .05 See cell for which m = 5, т = 9 

But 1/4.78 = .209 and 1/3.48 = .287 

Therefore P {.287 < 80/8 < 4.78 | = .90 


188 - Variances and Standard Deviations 


and P {.209 < s¢2/su? < 3.48 } 
Similarly Р {.165 < m/s < 10.15 } = .98 
P {.099 < s/s? < 6.06 } 


1 
> 
© 


| 
to 
00 


b. Asample of 21 cases has a variance of 25.3 and a sample of 18 cases 
has a variance of 12.2, the variance ratio being F = 25.3/12.2 = 2.07. 
If the two population variances are equal there is a probability of .05 of 
drawing samples such that $'51/s';s > 2.23. Since F < Р, the probability 
of drawing a pair of samples with variances as dissimilar as these is greater 
than .10. 

2. Fill the blanks in the following: 


Probability of 


2 at least as 
n № = ki Е Fu Fs exceptional 
an F 
а. 41 31 6.8 9.4 P» .10 
b. 12 60 10.4 3.6 Р < .02 
Orn 15 120 174 30.6 
4. 25 10 43.5 9.1 


е. 210 27 33.5 15.3 


3. From the F table, read values of F for which nı = 1 and nz has the 
values indicated in the column n below. Enter P; and F.» on the appro- 
priate lines. 

From a table of "Student's" distribution read values of £j; and 
t.s» for the values of n indicated. Enter these in the appropriate lines. 

Square the recorded values of f.975 and t.s and enter on the appropriate 
lines, If you have made no mistakes F s5 = Ё эъ and F s9, = t.s except for 
a possible difference in the last digit due to rounding error. 


Ti Ёз Lars Ëss Ёз Loos P 995 


5 
7 
15 
30 
oo 


4. From the F table, read values of F for which n; = © and n; has the 
values indicated in the column n below. Enter Fs; and F.» on the appro- 
priate lines. 

From the x? table, read х2. and x?» for the indicated values of n. 
Enter on the appropriate lines. 

Divide each recorded x? by the corresponding value of n and enter on 
the appropriate line. If you have made no mistakes, Ё.ь = х/т and 
Fas = x2.99/n except for а possible difference in the last digit due to round- 
ing error. 


Relation of F to t, x? and z · 189 


ny F ss xis х/т Fs x5» X5»/n 

2 

8 

10 

20 

30 

5. The values of F for m = 1 and m= o at several significance levels 

given below are taken from Statistical Tables by Fisher and Yates. Read 

values of z at the same significance levels from a table of the normal 

distribution and enter on the appropriate lines, Enter 2. Compare with 

the F values. They should agree except for differences due to rounding 

error. 

Ро = 1.64 Ро = 2.71 F 95 = 3.84 Fs = 6.64 Ею = 10,83 


2.9995 


20 = Za ———— m= 2% = 
299 = #2. = Zh. = 2 995 = 2? 9995 

Relation of the F Table to Tables of “Student’s” Distribution, 
the Chi-square Distribution and the Normal Distribution. As 
has been said before, each cell in the F table represents particular 
values of n, and m, to which а unique probability distribution 
corresponds. Questions 3, 4, and 5 in Exercise 8.2 suggest that 
for a certain subset of these cells the F distribution is closely re- 
lated to the chi-square distribution; for another subset it is 
closely related to “ Student’s” distribution; for one particular cell 
it is closely related to the normal distribution. 

Consider the cells in the vertical column at the extreme left 
of the F table. For these cells m = 1. Exercise 8.2 has indicated 
that for these cells P; = эъ and Ёз = tos, t having ne degrees 
of freedom. Problems utilizing this relationship will be taken up 
in Chapter 9. 

Consider the cells in the last row of the F table, where n; = co. 
Exercise 8.2 has indicated that for these cells ЕЁ» = х/т and 
F = x?/n. Here n is the degrees of freedom for x? and the degrees 
of freedom for the numerator variance in F. This interesting 
relationship is easily understood. F = 82/82. When m approaches 
infinity, s approaches с? and F approaches s2/o?. But 

тл812/0° = x? so 81°/0? = 2 пл. 
Therefore when n: approaches infinity, Ё approaches х/т. 

Consider the single cell in the last row and the left-hand column. 
In Exercise 8.2, we noted that the entries for Fi-a in the proba- 
bility distribution for this cell are the squares of the entries for 
zi-ja in the normal distribution. Because this cell is in the column 


190 - Variances and Standard Deviations 


in which m = 1, we have F = Р. Because it is in the row for which 
та = © we have Fi-a = Х1-./п = X*1-a/1 = x*-.. Then for this 
single cell, x71. = £,..,,, x? having 1 degree of freedom and ¢ having 
infinite degrees of freedom. This brings us back to a situation 
noted in Chapter 4. There it was seen that for large values of N 


it made no difference whether for the double dichotomy с i 5 опе 
(ad — be) N d 41 
computed (39 o dd d) d Back det dite d) and referred it to the x? dis 
tribution with 1 degree of freedom or computed z = Е 
p — p)N 

№ № 
and referred it to the normal distribution. In this problem 


Ха = 2-4. 

ContbeiioH of Two Variances Based on Related Scores. Sup- 
pose it is desired to compare the variance of a group at the begin- 
ning of an experiment with the variance of the same individuals 
at the end of the experiment in order to test the hypothesis that 
no genuine change in variance has occurred, that ол? = o? = o°. 
The ratio F = s?/s? does not provide a test for this hypothesis 
because under these circumstances 81° and s are based on cor- 
related scores and cannot be regarded as independent vari- 
ances. 

Let c;? and ce»? be the initial and final variance in the popula- 
tion and pi; the correlation between initial and final scores for the 
population. 

Let 1°, s? and ть be the corresponding values for the sample. 
Then if от? = дз”, the statistic 


(8.7) ЕЯ 
25185 1 — т? 

has "Student's" distribution with М — 2 degrees of freedom. Ап 
equivalent form of (8.7) which may be more convenient to com- 
pute is 
jt (222 — 2x?) VN - 2) 

2/52, — (Drt)? 

An illustration may be drawn from the Westover study рге- 
viously mentioned.” For the 45 freshmen using special practice 


exercises, the standard deviations of the Total Comprehension 
Scores at the beginning and end of the remedial period were 


(8.8) 


The Variances of Several Samples - 191 


sı = 4.05 and s: = 11.45. The correlation between initial and final 
scores was .39. Is this a significant change in variability? 


» { (11.45)? — (4.05) V43 т 
2(4.05) (11.45) V1 — (.39)* 


The change in variance is so large that no reasonable investigator 
would claim it was due to chance. 

The value r = .39 is the value published by Westover as the 
correlation between initial and final scores for the entire group of 
140 subjects. No value for the correlation coefficient was quoted 
for the 45 students considered in this problem. Knowing that 
т = .39 may be in error one way or the other, we should explore as 
far as possible the consequences of such error. Suppose the cor- 
rect value of r is larger than .39. Then V1 — т? will be even 
smaller than МТ — (.39)*, the denominator will be smaller than 
the one we have computed and ¢ will be even larger than 8.8. 
The final decision to regard the change in variance as due to 
systematic rather than chance causes would be strengthened even 
further. Now suppose the correct value is smaller than r = .39. 
Then the numerator would be larger and Е smaller than the one 
computed here. The extreme value in this direction would occur 
if r were 0. In that case 


10145) - (4.05) V23 _ gy 
212105) (11:45) И 


and the interpretation is not altered. If ¢ were near ф an error 
in the size of r might have serious consequences. 

The Variances of Several Samples from Populations Having 
the Same Variance. When several samples are observed, one 
often wants to know whether it is reasonable to believe that all 
their populations have the same mean p = № == Ш = Me 
That hypothesis will be explored in Chapter 9. One also often 
wants to know whether it is reasonable to believe that all their 
populations have the same variance, сё = с? =... = де = 0°. 
That hypothesis can be considered at this point. 

There are several different methods by which this hypothesis 
can be tested, some of which are described in references. ^*^ 
We shall describe two of these methods: (A) a method of great 
simplicity and convenience which can be used when all samples 
have the same number of cases, but which requires а special 
probability table, and (B) a method which can be used even when 


192 - Variances and Standard Deviations 


the number of cases is not uniform from sample to sample, which 
makes use of the chi-square table, but which is arithmetically 
more laborious than the first. 

A. All samples of uniform size. To illustrate this method we 
may use the data of Table 9.2, page 200. The variances for the 
three groups of 11 individuals each are as follows: 


sa? = 1188.97 зв? = 2837.47 sc? — 2140.40 


We wish to test the hypothesis тл? = тв? = сс? = 0°. There are 
6 possible F ratios among these three variances, and the largest 


2 
of these ratios is 2а, LU а 2.39. If this largest ratio is not 


4? 1188.97 
significantly large, none of the others is. Hartley" calls this 
largest F ratio 
= 2 
(8.9) Pras = у” 
Table VII, gives the upper 5% and 1% points for F maz in a set of 
k mean squares all based оп n degrees of freedom. In the data 
of Table 9.2, k = 3 and п = 10. In the appropriate cell of Table 
VII we find the entry 4.85, indicating that 
P{F maz > 4.85} = .05 
The critical region is F > 4.85. The observed value Ра» = 2.39 
does not fall in the critical region and so the hypothesis 
бд? = св? = со? 
is acceptable. 

В. Not all samples of uniform size. Harris? administered a 
test of information concerning the social factors of the community 
to teachers in four St. Louis schools, and found the means and 
variances to be as follows: 


School Ni ni s? x, 
1. Jefferson 20 19 11.22 12.50 
2. Banneker 22 21 10.30 10.40 
3. Cole 22 21 6.20 11.91 
4, Marshall 15 14 9.79 11.94 

79 


She wanted to know whether the 79 teachers could be combined 
into a single group or whether these school differences would throw 
suspicion on the hypotheses 

Mi = Ш = Ws = ba = р 
And 01° = с? = of = of = о? 


The Variances of Several Samples - 193 


Applying the method of section A, we note that the ratio of the 
largest variance to the smallest is 


11.22 
Ета» = 7630 = 1.81 


Here k = 4 and n varies from 14 to 21. Examination of Table VII 
indicates that the critical value decreases as n increases. There- 
fore if for the largest value of n the observed Ёма is not in the 
critical region we shall know it would not be in the critical region 
for a smaller ». Here we find that for 


k = 4 and n = 20, the critical region is Fmaz > 3.25 
and even for 
k = 4 and n = 30, the critical region is Ём > 2.59 


Clearly the observed value is not in the critical region and the 
hypothesis тү? = o? = оз? = o4 is sustained. 

If the Hartley test described above had resulted in the de- 
cision to reject the hypothesis when Table VII was entered with 
the largest n and to accept when entered with the smallest n, ab 
might have happened, a different test allowing for variations in 
n would be needed. The great simplicity of the Hartley test 
makes its use advisable whenever it gives unambiguous results. 
In other situations Bartlett’s ! test to be described below may be 
used. 

Let 812, 822 - +: 812 be the variances of k independent samples 
having respectively nı, 72 * * * % degrees of freedom. Then under 
the hypothesis that а? = 9? == cgi = 0? the estimate of о? 
obtained by pooling the variances of the k samples is 


тз? + тив +++ + тв Xmas ^ 
(810) s? ay En RET where n = Zn; 
We compute the statistic 
В В 
(8.11) В= = (logio У nis? — logi п) — У Ni logio se) 
i= i=1 
= B/C i 


1 NU T 
(8.12) where C= PEE 


$17 
Even if some of the n; are quite small, say 5 or more, the statistic 


B has a chi-square distribution with k — 1 degrees of freedom. It 
is not always necessary to compute C. If B' is not significantly 


194 - Variances and Standard Deviations 


large, B will be even smaller because C is always larger than 1. 
Therefore C need be computed only if B' appears significantly 
large. 

The procedure known as the Bartlett test for homogeneity of 
variance is shown in Table 8.2 where it is applied to the variances 
of the same four St. Louis schools discussed at the beginning of 
this section. The reader who is not familiar with logarithms will 
need to refer to an algebra text or to reference eleven. 


TABLE 8.2 Computation to Test the Homogeneity of Variance of Teachers in 
4 St. Louis Schools on a Test of Information about the Community.* 
SERO LN COS MEME ga tuac a 


1 
School Ni ni nisi s? logos? т; logio 81 a 
i 


1. Jefferson 20 19 2132 11.22 1.0500 19.9500 .0526 
2. Banneker 22 21 216.3 10.30 1.0124 21.2604 .0476 
3. Cole 22 21 130.2 6.20 0.7924 16.6404 .0476 
4, Marshall 15 14 137.1 9.79 0.9908 13.8712 .0714 

Sum 79 75 696.8 3.8456 71.7220 .2192 
logio En;s? = logo 696.8 = 2.8431 В' = 2.3026 {75(0.9680) — 71.7220} 

logio Хт; = logo 75 = 1.8751 = 2.3026(0.8800) 
0.9680 = 2.026 
1 1 В’ 2.026 

C= 1 + gai (2092 - у} = 10229 В = = 10929 = 1-98 


* From Harris, Teachers’ Social Knowledge and its Relation to Pupils’ Responses, 
page 22. 

The statistic В = 1.98 is referred to a chi-square table with 
k — 1 = 3 degrees of freedom, where x?» is found to be 6.3. As 
В < x% the evidence does not refute the hypothesis that 


01? = с? = оз? = 04? = 0. 


This conclusion is in agreement with the conclusion previously 
reached by reference to the Hartley Table. 

Cochran? has proposed a statistic which is the ratio of the 
largest of the variances in k independent samples each having n 
degrees of freedom to the sum of all the variances and has derived 
its sampling distribution. This statistic is meaningful and simple 
to compute, but a special table is needed for its sampling distri- 
bution. Since this book is not attempting to present all the useful 
sampling distributions but only those which are needed most often, 
Cochran’s test is mentioned here for the information of the reader, 
but without necessary tables. These tables may be found on pages 
390-391 in Techniques of Statistical Analysis.* 


References - 195 


Bartlett’s test is useful to detect lack of homogeneity in general, 


of any sort. It is not so sensitive as Cochran's test for detecting 
lack of homogeneity arising because the variance of one popula- 
tion is considerably larger than the variances of the others. 


The 5% and 1% significance level of the numerator of Bartlett’s 


statistic have been tabulated by Thompson and Merrington.” 


REFERENCES 


1. 
2. 
3. 


11. 


12. 


Bartlett, М. S., “Properties of Sufficiency and Statistical Tests,” Proceedings 
of the Royal Society of London, A, 160 (1937), page 268. 

Cochran, W. G., “The Distribution of the Largest of a Set of Estimated 
Variances as a Fraction of their Total,” Annals of Eugenics, 11 (1941), 47-52. 
Dixon, W. J. and Massey, F. J., Jr., Introduction to Statistical Analysis, New 
York, 1951, McGraw-Hill Book Company, Inc. See index under * variance." 


. Eisenhart, C., Hastay, M. W., and Wallis, W. A., Techniques of Statistical 


Analysis, New York, 1947, McGraw-Hill Book Company, Inc. Chapter 15, 
“Significance of the Largest of a Set of Sample Estimates of Variance.” 


. Fisher, R. A. and Yates, F., Statistical Tables for Biological, Agricultural and 


Medical Research, New York, 1948, Hafner Publishing Company, 3d Ed. 


. Harris, Ruth, Teachers’ Social Knowledge and its Relation to Pupil's Responses, 


a Study of Four St. Louis Negro Elementary Schools, New York, 1941, Bureau 
of Publications, Teachers College, Columbia University. 


‚ Hartley, Н. O., “The Maximum F-Ratio as a Short-cut Test for Heterogeneity 


of Variance,” Biometrika, 37 (December, 1950), 308-312. 


„ Merrington, М. and Thompson, С. M., “Tables of Percentage Points of the 


Inverted Beta (F) Distribution,” Biometrika, 33 (1943), 73-88. 


. Mood, A. M., Introduction to the Theory of Statistics. New York, 1950, McGraw- 


Hill Book Company, Ine., pages 267-270. 


. Thompson, C. M. and Merrington, M. “Tables for Testing the Homogeneity 


of a Set of Estimated Variances,” "Biometrika, 33 (June, 1946), 296-304. 
Walker, H. M., Mathematics Essential for Elementary Statistics, New York, 
1951, Henry Holt and Company, 9d Ed. Chapter 17, “Logarithms. st 
Westover, F. L., Controlled Eye Movements Versus Practice Exercises in Reading, 
New York, 1946, Bureau of Publications, Teachers College, Columbia Uni- 


versity. 


9 Analysis of Variance 


Methods for testing hypotheses concerning means of two 
populations were considered in Chapter 7. The present chapter 
will deal with hypotheses concerning the means of several popula- 
tions. The method for testing the hypothesis ш = p» presented 
on page 156 will be shown to be a special case of the method for 
testing the hypothesis uj = ш = us = * * = Me to be presented in 
the following pages. 

Use of Double Subscripts. In a great many statistical prob- 
lems, in fact in most of those which will be discussed in the re- 
mainder of this book, observations need to be classified on two or 
more variables simultaneously. As the classification becomes more 
complex it is helpful to have an explicit symbolism to indicate 
exactly which values contribute to a particular sum or mean or 
variance. In such situations clarity may be promoted by the use 
of double (or multiple) subscripts, as in the following rectangular 
arrangement. 


Хи X» Xs Xu Xy Хь Хи 
Xn Xn Xn Xn Xs X» Xn 
Xn X» Хз Xu Xs Xs Хи 


Each entry here represents a numerical score. The symbol X» 
is read “X two three" or “X sub two three," not “X twenty- 
three.” If the arrangement represents the scores of 7 persons on 
3 tests, the entries in any one vertical column are the 3 scores 
made by one person, the scores in any one horizontal row are the 
scores of 7 persons on one test. If this arrangement represents 
3 samples of 7 individuals each, the entries in the first column 
represent the scores of the first individual drawn in each sample, 
the entries in the first row represent the scores of the 7 individuals 
of the first sample. Innumerable other interpretations might be 
made of such an arrangement. A rectangular arrangement of 
numbers is called a matrix. 

Assuming that the arrangement represents the scores of 7 per- 
sons on 3 tests, study the following parallels: 


Use of Double Subscripts · 197 


X», represents a score on Test 2 made by the sixth person 

Х в represents a score made by person 3 on one of the tests, on an 
unspecified test 

X»; represents a score on Test 2 made by some person, an unspeci- 
fied person 

X; represents a score on one of the tests made by one of the per- 
sons, neither test nor person being specified. 


Then the sum of the scores 


7 y 
in the first row is Y, Xy and in some row is у Xi 
ji joa 


3 
in the first column is Y Ха and in some column is Y, Хи 
i=l t=1 


The sum of all 21 scores is 


The first letter or digit in the subscript customarily indicates the 
row; the second letter or digit indicates the column. In situations 
such as these omission of the variable subscripts and limits of 
summation begets confusion. If one wrote DX for data from this 
matrix there would be no way of telling whether the expression 
means the sum for a row (and which row?), the sum for a column 
(and which column?), or the sum for the entire matrix. 
We also need a symbolism which will permit us to distinguish 
т 
between the mean of the 2d row which is 3Y Xy and the mean of 
- 


a 
3 
the 2d column which is 49 Хь. The important aspect of this 
i=l 


symbolism is to þe able to distinguish whether 2 is the first sub- 
script or the second. To make this point clear it is customary 
to write X with subscripts similar to the subscripts for X in the 
summation but with a dot replacing the variable subscript over 
which summation has taken place. Thus in general 


10 1 r 
do), Xa would be X5 x X; would be X; 
i=1 i=l 
40 с 
ду Ху would be X. 1$ xj would be X. 
j=l 


je 
ly $ ху would be X 
$=17=1 d 
If the meaning is clear without it, the dot should be omitted. 


198 - Analysis of Variance 


A similar convention may be applied to the variance and the 
standard deviation. Thus 


10 
$У (Xa — X43)! would be written в? з 
i=l 
40 
as), (Xi; - X4)? would be written s*; 
ј=1 
н 1 I 2 (X; – X.)? would be written s*.; 


V Я i i Y (Xi; – X)? would be written s: 
j 


=1 


V = - i » Y (Xu — X.)* would be written s. 


tml j=l 


Often X., 5. and s, are written merely as X, s? and s when omis- 
sion of the „ creates no confusion. 


EXERCISE 9.1 
Write out all entries for a matrix of 4 rows and 3 columns, using X 
with appropriate subscripts for each entry. 
Write the expression for each row total in the matrix just written, that 
3 


for the first row being X Xy. 

Write the donee each column total in the matrix, that for the 
first column being $ Xa. 

Write the бурса for the total for all 12 entries. 

Write an expression for each row mean, the mean for the first row 
being X. =4 xe 

Write an он for each column mean, the mean for the first 


4 
column being X ; =# Y Ха. 
© 


Write an expression for the mean of the entire matrix. 
Write an expression for the variance of each row, the variance for the 


3 
first row being s3. = à » (Ху Xi y. 
jel 
Write an expression for the variance of each column, the variance for 


4 
the first column being s?.1 = 3 У (Xa — X3). 
i=1 


Use of Double Subscripts - 199 


Write an expression for the variance of all 12 entries. 

Write expressions for the mean and the variance of an unspecified row, 
the mean and the variance of an unspecified column. 

Check what you have written against the symbols recorded in Table 9.1. 


TABLE 9.1 Symbols for Mean and Variance of a Subgroup 


aug Row Row Row 
Individual Seores total mean variance 
3 3 
3 È Xiu È (Xu — Xa) 
Xu Xe Xs ZXi X. = 1 3 $85, == 5 
j=l 
3 3 
A È Хы 2 (Xa — Xa) 
Xa Xn Xs УХ» Xi = 8% = 220 5 
ј=1 
3 3 
3 2 Хз 2 (х — X;,)? 
Xu. Хем toe Се = = 
ј=1 
3 3 
a È Xu (Ху — X 
Xa Xn Xa Ў Ху X. US gy = 5 
j=l 
Column = $ 
X; Xa ХХ; 
total 2 : 2 dE 
Xa Xa X 3 
4 4 
Column | S xa XYa Ха 
mean i=1 і=1 i=l 
4 4 4 
84 #2 8a 
Column E i $ 1 
А 2 (Xa - Xo Z(Xa- Xj Z(Xa- X3 
variance | i-i iei iei 
3 3 


3 


Row total for an unspecified row — zx ti 
j= 


Column total for an unspecified column = zx 4 
= 


3 4 в 
Sum of column totals = 2 DXi Sum of row totals = 2 Ха 


j=1i=1 i=l1j= 


4 3 SUE 
2х, X 2 Хи узу, 


Mean of entire group = Х = E == = 255 

BU 

ZZO4-X* хуу х) 
Variance of entire group = $ = И a x 


Sum of row totals = sum of column totals. 


200 - Analysis of Variance 


Comparison of Three Means. As an illustration of a problem 
in which the means of three populations are to be compared, con- 
sider an experiment carried out by Schroeder? to study factors 
influencing performance in archery. The standard * Round" in 
archery consists of sets of 6 shots at each of three ranges, 30, 40, 
and 50 yards, shot in that order. One of the goals of Schroeder's 
study was to find out whether order of shooting the different 
ranges has an effect on performance. If the ranges are shot in 
non-standard order will the scores be comparable with scores ob- 
tained from the standard series? 


TABLE 9.2 Archery Scores at the 50-Yard Range Made by 
33 College Women.* 


Individual (2X)2/N 


Group eee BX = Tix zx: c TUN У(Х - Xy 
A 99, 114, 51, 78 
134, 71, 66, 33 977 88.82 98665 867753 118897 
146, 80,105 


104, 26, 136, 87 
B 187, 34,106, 63 995 90.45 118377 900023 283747 


155, 68, 29 
41, 83, 61, 189 
с 88, 80,141,112 1221 111.0 156935 1355310 214040 
141, 112, 173 
Sum 3193 373977 3123086 616684 
Entire group 3193 96.76 373977 3089469 650301 
Шы MESE IN ue зу ы е анадан, 


* Data from Schroeder.’ 


A sample of 33 subjects was chosen and divided randomly 
into 3 equal subsamples. Each group of 11 subjects was given 
6 lessons in archery, the order of shooting the ranges being changed 
from lesson to lesson. (Note that there are 6 possible orders, be- 
cause the number of arrangements of 3 things is 6.) Now con- 
sider the performance at the 50-yard range when this range was 
shot first by Group A, second by B, and third by C. Table 9.2 
presents the resulting scores, each score there being the sum of 
the performance scores of one individual in two lessons. The 
means of the three groups seem to suggest that the range shot last 
has the advantage. However before concluding that the advan- 
tage is real, existing in the population of all possible observations, 
it is necessary to determine that the variation among the three 
observed means is too great to be attributed to the chance effects 
of sampling. 


Hypothesis to be Tested · 201 


Note that if the study had been set up to compare scores made 
on the range shot last with scores on either of the other ranges, 
it would be satisfactory to combine the scores of Group A and B 
into one group of 22 cases, which for convenience we may call D, 
and to test the hypothesis that uc = ир by the methods of Chapter 
7. However the experiment was not set up to test that hypothesis 
but to test the hypothesis wa = ив = Мс. When observation of the 
data suggests a new hypothesis, that new hypothesis can never be 
satisfactorily tested on the data from which it was obtained but re- 
quires a new set of observations. 

Mathematical Model. To specify the mathematical popula- 
tion (that is to provide a mathematical model), we shall assume 
that each of the three samples is drawn at random from a normal 
population and that all three populations have the same variance 
but not necessarily the same mean. 


Population Mean Variance 
A Ha o? 
B рв 
ис 


For precision, the assumption that all three samples were 
drawn from normal populations with the same variance needs to 
be investigated. Application of the tests of normality described 
in Chapter 5 indicates that the assumption of normality may be 
accepted. (Usually it is satisfactory to look at the grouped fre- 
quency distribution of scores and to decide more or less intuitively 
that the distribution appears fairly normal.) The assumption of 
equality of variance can be tested by comparing the three sample 
variances by the method described in Chapter 8. This test in- 
dicates that the assumption of equality of variance may also be 
accepted. 

Hypothesis to be Tested. The hypothesis to be tested is 

pa = ив = Шс = И 

If this hypothesis is true and the assumption of normal popula- 
tions with equal variance is justified, then the 33 scores may be 
regarded as random observations from a single normal population 
with mean р and variance o°. The means of the three samples 
would then be random observations from a normal population of 

2 2 
sample means for which E(X) = в and ex? = У, = a 


in Chapter 7. 


‚ as explained 


202 - Analysis of Variance 


The unknown variance о? can be estimated from the variation 
among the three sample means. It can also be estimated from 
the variation among the archers within groups. The ratio of these 
two estimates of c? provides the statistic by means of which the 
hypothesis can be tested. 

Estimate of o? from Variation Among Means. The value of 

2 
сү = P which in this problem is =, can be estimated by com- 
puting the variance of the means shown in Table 9.1 around the 
mean of the entire group. These three means may be treated as 
a sample of three individuals, so that their variance has 3 — 1 = 2 
degrees of freedom. Then 
3 


хх, ха, -х_ 200-0» 
2 2 


зу? = 


2 
is an unbiased estimate of ox? = i’ and therefore 


2 2 


is an unbiased estimate of c? if the population means are equal. 
In general if there are k samples of N cases each, 


k k 
NYQ,-X» МЎ, Ху 
E ja ЕЙ 

(9.1) Му mem al 
is the estimate of о? obtained from variation among the means. It 
is usually called the “mean square between group means,” or the 
* mean square for variation between means" or the “mean square 
between groups" or “mean square for means." 

The numerator of Formula (9.1) is the sum of squares between 
groups. The denominator is the number of degrees of freedom. 

Applying Formula (9.1) to the data of Table 9.2 we obtain 


115? = %/[ (88.82 — 96.76)? + (90.45 — 96.76)? + (111.0 — 96.76)? ] 
= 3362 — 1681. 


Computing routines which are algebraically equivalent to that 
of Formula (9.1) but which demand less arithmetic labor and pro- 
duce smaller rounding errors are furnished by Formulas (9.2) 
and (9.3). 


Estimate of c? from Variation within Groups : 203 


il k М $ 1 Е N 
= n) — —— m 
AQ - y Qi 339 


(9.2) Neg? = "=" 71i 
FLO - ур C24? 
k-1 
KY (DX) - (2X)? 
(9.3) Nex’ = И-П 


Applying these formulas to the data of Table 9.2 we have 
(977? + 995° + 12212) (3193) 


by (9.3) Nsx? = ERN NUNC = 1680.8 
and by (9.3) 
Nap = BOTT + 9054+ 1221) — 3193" _ 1680.8 


11(3) (2) 

Estimate of c? from Variation within Groups. Formula (8.10) 
was stated on page 193 as providing an estimate of g? based upon 
the pooled variance of several groups. In the present problem 
the estimate of c? thus obtained would be 

(Ni — Ds? + (№- 1522 + (Ns — sé _ 1051? + 105? + 10857 

№, + № + №-3 11114 11-3 
This is usually called the “mean square within groups.” In general 
if there are k groups of N cases each, c? is estimated as 
в М 
X,- X) 
(9.4) PES Cus ins P ( d D 
У d МЕ – k МЕ — Е 
The numerator of (9.4) is the sum of squares within groups. The 
denominator is the number of degrees of freedom. 

Computing routines which are equivalent to that of Formula 
(9.4) but which involve less arithmetic and less rounding error are 
provided by Formulas (9.5) and (9.6). 

È N il k N 
Ут у С, xe 
9 iiid íi 
(9.5) s NER О. zy 


уу-у 


= ШЕ; 


2 (N - Dk 


204 - Analysis of Variance 


муу х – У (УХ) 
e хы? 2 + 
(9.6) Ге NUT DES 
Applying Formula (9.4) to the data of Table 9.2 we have 


a _ 11889.7 + 28374.7 + 21404.0 _ 61668.4 _ 

8 8—8 Belg = 2055.6 
'This computation is simple only because the numerator had al- 
ready been worked out in Table 9.2. Applying Formulas (9.5) 


and (9.6) we have 
_ 373977 — 11(977° + 995° + 1221?) _ 


s? (10)(8) 2055.6 
11(373977) — (977? + 995? + 12217?) 
2 = E 
and 8 71(10)(3) 2055.6 


The Variance Ratio. We have found two estimates of o°, 
namely 1681 and 2056. Estimates computed in this way have the 
property that they are independently distributed. See page 208 
for further discussion of this property. А third estimate of c* 
may be obtained from the deviations of all 33 scores around the 
grand mean X but this estimate is not distributed independently 
of either of the other estimates and therefore cannot be used in 
the F ratio about to be described. 

In Chapter 8 it was said that any ratio of the form (N — 1)s°/o? 
or ns?/c? has a chi-square distribution with n = N — 1 degrees of 
freedom. It was also said in that chapter that if 317/52? are two 
independent estimates of the same variance o*, the ratio 312/32? 
has an F distribution with n, degrees of freedom for the numerator 
and n; for the denominator. Now since 


ms? 


Nes? 
а? 


may be called x,? and o: 


called x;?, 


2 2 2 2 

а may be called ** and 55 called Х. 

с ni c? Ti 
Then 8° _ xm _ Хг, т 
and any expression which can be reduced to this form has an Е 
distribution if x;?/n; and x22/n2 come from independent estimates of 
the same variance. Therefore in the problem under consideration 
the ratio of the mean square between group means to the mean 
square within groups is distributed as F with 2 degrees of freedom 


Critical Region - 205 


for the numerator and 30 for the denominator, or in general with 
k — 1 degrees of freedom for the numerator and Nk — К for the 
denominator: 


mean square between group means 


(9.7) р mean square within groups 
sum of squares between groups , Nk-k 
= "gum of squares within groups k-1 
EM eran " 
(9.8) or Fe М(Х; X) UNE E 
zx(Xjg-X.j' k-1 


Analysis of the Archery Data. The computations already made 
from the data of Table 9.2 may now be summarized in Table 
9.3, which is in the general form customarily used for such prob- 
lems. Usually the value read from an F table with which the 
observed F is to be compared would also be included in the table, 
but as that has not yet been discussed it will not be shown here. 
Interpretation of the value F = .82 depends upon the selection of 
a critical region and will be discussed in the following section. 


TABLE 9.3 Analysis of Variance of 33 Archery Scores 


Sum of  Degreesof Mean 


Source of variation 
heed ig е squares freedom square 


Total group 65030 32 
Between group means 3362 2 1681 82 
Among archers within groups 61668 30 2056 


Critical Region. From Table X in the Appendix, a critical 
region with either a = 05 or a = .01 can be obtained. For testing 
the hypothesis under consideration, the variance ratio has two 
degrees of freedom for the numerator mean square (m = 2) and 
30 for the denominator (n; = 30). The corresponding cell of the 
table contains the entries 3.32 and 5.39. From these it is to be 
understood that 

P(F > 3.82 | m = 2, m= 30) = .05 
P(F > 5.39 | m = 2, ™ = 30) = .01 


or P(F < 3.32 [m = 2, m= 30) = .95 
P(F < 5.39 | m = 2, њ = 30) = .99 
ог Е. = 3.32 and Fø = 5.89 


Thus for æ = .05 the region of acceptance is F < 3.32 and the region 
of rejection is Ё = 3.32; while for a = .01 the region of acceptance 


206 - Analysis of Variance 


is F < 5.39 and the region of rejection is F = 5.39. It may be 
assumed that values precisely 5.39 and 3.32 will never occur, but 
values equal to these within a given margin of rounding error may 
appear. The research worker should make up his mind before he 
sees the results of his computations whether he will consider that 
such shall be considered to fall in the region of acceptance or of 
rejection. If he decides to place them in the region of rejection, 
he would define such as F = 5.39 for a = .01, and similarly for 
а = .05. 

The observed value F = .82 falls well within the region of ac- 
ceptance and one must conclude that the data are consistent with 
the hypothesis that scores obtained at the 50-yard range are not 
influenced by the order in which that range is shot within the round. 

In the present test the critical region has been placed in the 
upper tail of the distribution while for the problems considered in 
Chapter 8 it was in both tails. The difference in the logic of the 
two situations needs to be examined. 

In Chapter 8 the values of s? and 32? used in the variance ratio 
were obtained from two independent samples which by hypo- 
thesis came from populations having the same variance. Too 
great a discrepancy between s;? and s: is damaging to that hypo- 
thesis regardless of which variance is larger. Therefore the critical 
region must include both tails of the distribution so that the 
hypothesis will be rejected if 


either 812/35? is very small or s;/s?is very large 
and this is the same as if 
522/51? is very large ог s/s? is very small 


The procedure is to place the larger variance in the numerator and 
to consider the probabilities associated with the two-sided critical 
region to be not @ but 2a. Significance levels read from Table X 
for such problems will be .02 and .10. The study of Figures 9-2 
and 9-3 on pages 208 and 209 will further clarify this point. 

In the problem under consideration in the present chapter, 
the two variances are obtained from the same sample. The var- 
iance within groups estimated by Formulas (9.4), (9.5), or (9.6), 
is an unbiased estimate of о? whether the hypothesis tested is true 
or not. If the hypothesis that m = p =--- = рь = и is true, the 
variance between groups is also an unbiased estimate of 0°. How- 
ever if the groups have different means in the population, the 
mean square between groups will have an expectation larger than o°. 


Form of the F Distribution - 207 


Form of the F Distribution. As this curve has a different 
form for every possible pair of numbers л and ns, the attempt to 
visualize it is not very rewarding. It is always skewed with range 
from 0 to infinity. Negative values of F cannot occur because both 
numerator and denominator are necessarily positive. The highest 
point on the F curve occurs at F = тіт — 2) but that fact is not 

m(n — 2) 


very important. Its mean is at F = 77 which is slightly above 


1.0. Falsity of the hypothesis ji = M» = * = Be cannot possibly 
make the mean square between means smaller than 0” and must 
make it greater. "Therefore in problems of this type the mean 
square between groups is always placed in the numerator and the 
mean square within groups in the denominator regardless of 
which is the larger. Then very large values of F throw doubt on 
the hypothesis but very small ones do not. As large values of F 
oceur in the upper or right-hand tail of the sampling distribution 
of F, the critical region is in the right tail only. Significance levels 
obtained from Table X will be .01 and .05. 


0 1.0 2.0 3.0 4.0 
Scale of F 


Fig. 9-1. Sampling distribution of F when m = 3 
and m, = 16 with empirical distribution of the 34 
samples values. 


One may ask what can produce a very small F, since failure 
of the hypothesis cannot make it small When a suspiciously 
small F occurs one asks if computations are correct, if the cases 
were selected at random, if the data were in any sense artificial 
or improperly gathered. For example, instructors have been 
known to make up examples from artificial data yielding F’s that 
are improbably small. If there appears to be nothing wrong with 
the data or the computation, one merely ascribes the situation 
to chance. When F < 1, it is customary to say at once that the 
hypothesis is accepted without referring to probability tables. 


208 - Analysis of Variance 


However, if F is very small one may suspect that it represents a 
non-chance situation and may wish to make a test of significance. 
The procedure then is to take the reciprocal of Ё and to compare it 
with the tabular entry for which т» is the degrees of freedom for 
the numerator and л for the denominator. For example for the 
Schroeder data F = .82, so 1/F = 1.22. In the cell for which 
т = 30 and m=2 we find Ру = 19.46, and Р.» = 99.47. If, 
therefore, we had the complete distribution for the variance ratio 
with n; = 2 and m = 30, it would show us that 


Fo = 1/99.47 = .01005, F.o = 1/19.46 = .0514, 
Fos = 3.32 F» = 5.39 


while the complete distribution for the variance ratio with m = 30 
and n; = 2 would show 


1 1 
Fos fe 5.39 = .19, Fos T 3.32 - .301, Ёз = 19.46, Ро = 99.47 


It may interest the reader to see pictures of the curve for selected 
values of n; and m. Figure 6-6 on page 139 shows the form of 
the curve for nı = 4 and m=4. Figure 9-1 shows the form for 
т = З and т» = 16 and also the empirical distribution of 34 sample 
values of F obtained by students in a statistics class. These are 
the values of the fraction A/B obtained on row 14 of Worksheet 


Oa b 1 B 2 A3 4 5 6 
Value of F 


Fra. 9-2. Sampling distribution of F when n; = 4 and n, = 20. 


5% of area lies above F = 2.87, at A 

5% of area lies below F = .17, ata Mode = .45 
20% of area lies above F = 1.65, at B Mean = 1.11 
20% of area lies below Ё = .41, at b Median = .9 
57% of area lies below F = 1. 


Independence of the Components · 209 


III in Chapter 6. 

Figure 9-2 shows the curve for m = 4 and m = 20 while Fig- 
ure 9-3 shows the curve for m = 20, and m = 4. These two 
curves may now be studied together. If 9-2 represents the dis- 
tribution of F = s?/s? then 9-3 represents the distribution of 
Е' = 1/Е = 82/31. Either curve alone shows the probability for 
every possible value of the ratio between the two variances. How- 
ever ratios smaller than 1 would be closely crowded on the base 
line between 0 and 1 and would have to be given in the table 
with a large number of decimal places in order to distinguish 
probability levels. The segment of the horizontal axis in Figure 
9-2 to the right of the point 1 represents the same situations as 
the segment of the horizontal axis of 9-3 to the left of 1 and vice 
versa. Together the two segments to the right of 1 represent all 
possible situations. Therefore it is satisfactory to tabulate only 
the right-hand segment of each curve, and this is the reason that 
Table X contains no entries smaller than 1. 

Independence of the Components of the Sum of Squares. The 
two components into which the sum of squares has been analyzed 
have the surprising property of being independently distributed. 
This property of independence holds in spite of the fact that the 
two components are based on the same measures. As these two 


O a! abl, cl ов 3 4 5 A'6 


Value of F 
Fia. 9-3. Sampling distribution of F when ты = 20 and nz = 4. 


5% of area lies above F = 5.80, at A’ 

5% of area lies below F = .35, at а’ Mode 
20% of area lies above F = 2.45, at В’ Mean 
20% of area lies below F = .61, at b' Median 
43% of area lies below F = 1. 


.6 
2.0 
EL 


wud 


210 - Analysis of Variance 


components of the sum of squares are proportional to the two es- 
timates of the variance, those estimates also are independently dis- 
tributed. The word “independent” implies that if many samples 
of the same size from the same population are examined, the value 
of one of these components in a sample is in no way predictive 
of the value of the other. By way of illustration, there are avail- 
able computations made by 46 students of the variance between 
means of 4 samples of 5 cases and the variance among individuals 
in these same samples. 

These two components have been distributed in a scatter dia- 
gram in Figure 9-4. The correlation is r = — .05, a value which is not 
significantly different from zero by the test described in Chapter 10. 

Unless the two components of variance are independent, the 
ratio of the related variances does not have an F distribution. 
The algebraic relationship of Formula (9.9) holds for all samples 
regardless of the population from which they were obtained but 
this is a relationship within a sample and does not establish inde- 
pendence from sample to sample. 


12500 
11500 
10500 
9500 1 
8500 TET 1 1 
7500 17 

6500 1 

5500 |1 1 1 

4500 | 1 1 1 1 
3500 1 2 

2500 decr ҮЧ 31 1 


1500 1 
[59 1 


to 
m 
= 
m 


Value of 5Z(X; — X)? 


Ета. 9-4. Joint frequency distribution of the sum of 
Squares among means 5Z(X; — X)' and the sum of 
squares within groups ZX(X;; — Xj) from 46 samples 
of 5 cases. r = — .05. 


Algebraic Relations in Analysis of Variance. While the com- 
puting procedure described in the preceding pages can be carried 
out when numbers in the subsamples are equal, slight modifica- 


Algebraic Relations in Analysis of Variance · 211 


tions are necessary if those numbers are unequal. Moreover 
certain algebraic relations among the sums of squares can be used 
to simplify the computations considerably. Those relations will 
be described in the following paragraphs. 

The double subscript notation described at the beginning of 
this chapter is particularly useful for analysis of variance. Let · 
Xj; represent the ith observation in group j, ог subsample j. Be- 
cause the formulas look rather long and imposing, though they 
are not essentially difficult, we shall introduce the symbol T as 
gum or total of a set of scores. 

М 
УХи = Ti and X, = ТИМ 
$=1 


Ni 

YXa- T; and X; = ТИМ; 
i 

k 

фм -м 

іл 


k Ni 
Y УХ: = Т and X= T/N 
j=1i=1 
Then for any one subsample, say the jth, the sum of the squares 
of the deviations of individual observations from the subsample 
mean is 


Ni Ni Т2 
у(х — X) = УХ - Tied 
i=l i=1 2 


For brevity this expression will be called the sum of squares. 
The total sum of squares can be partitioned as follows: 

k Ni k Ni k 
(9.9) Ў, 2 Xu Ee ya X | Xa -X+ Ум; -Xy 

T ii acl јт=13= DC 
Equation (9.9) is very important, is in fact the basic relationship 
underlying the process of * analysis of variance" described in this 
chapter. The total variation described by the sum of squares of 
all N cases around the total mean has been broken up into two 
components each of which is expressive of a type of variation. 
Thus it is the sum of squares rather than the variance which is 
analyzed. The sum of squares can be analyzed into meaningful 
components in a number of different ways, Some to be described 
here and some in subsequent chapters. In Formula (9.9) the 


k Ni - а B H 7 
component Y, (Ха — Xj)? represents variation among individ- 
j-21i-1 
uals within groups. It would be zero only if all individuals in 


212 - Analysis of Variance 


k 
each group had the same score. The component Y N;(X;—- X) 
ja 


represents variation between group means. It would be zero only 
if all samples had the same mean. 

Some additional algebraic relationships are useful to simplify 
calculations. 
A. When the number of cases is the same for all groups, so that 


Nı=N:=--- = М, = M, Formulas (9.10), (9.11), and (9.12) 
may be used for the sums of squares: 
k M вм Т? 
: dac Jr сле 
(9.10) Total: Y Y (Xs - X) Dr sas 
11) B * МУХ хө» 1 Ўта D 
(9.11) Between * means: PA ;-X)= и?” P 


jos k M k M 1 k 
(9.12) Within groups: P po: TET A p P = TÈ T? 


B. When the number of cases differs from group to group, For- 
mulas (9.13), (9.14), and (9.15) will be needed for the sums of 
Squares. 


9.13) Total: Y Y(X4- X - Y Ух, - 25 
Н к= = Жы Кес 
(9.13) Total 2 à 2-Х) p PEE; 

k x k 2 T? 

< А 425 2 = ж emn 

(9.14) Between means: Pu ; (X; - X) p XM 


(agn k Ni k Ni к po 
(9.15) Within groups: У XY(X5-X)!- Y XXu- У Ww. 
j-21i-1 j=li=1 j=1 1 
Computation Procedures Applicable when Subgroups are not 
of Uniform Size. Data from a study by Harrington! provide an 
illustration. He made a study of the recommendations written 
for a selected group of applicants for teaching positions in different 
high school subjects, to determine what characteristics of these 
recommendations, if any, are associated with success in obtaining & 
position. The number of recommendations varied from candidate 
to candidate. ‘Successful’? candidates were those who were 
* Whenever the authors can remember to do so, they are using the phrase “between 
groups" or "between means" whether there are two groups or more than two. This 
flouting of usual grammatical convention regarding “between” and “among” is deliber- 
ate. Experience indicates that use of the phrase "between means” when there are two 
samples and “among means" when there are more than two introduces a verbal com- 
plexity which causes some students to confuse the latter phrase with the phrase “among 
individuals within groups.” 


Computation Procedures · 213 


elected to the position for which they made application. “Un- 
successful” candidates were those not elected to these same 
positions, although they might subsequently have been elected 
to other positions. In all, 59 independent similar positions were 
available, providing 59 successful and 134 unsuccessful candidates, 
for whom there were 1144 recommendations. 

The recommendations were scored on a large number of traits. 
The data presented here relate to length of recommendation, а 
score being the “number of items commented on not unfavorably” 
in the recommendation. Table 9.4 shows the basie data for the 
6 successful candidates and Table 9.6 for the 14 unsuccessful can- 
didates, selected from the entire group to illustrate procedures. 


TABLE 9.4 Scores on Length of Recommendations Written 
for 6 Successful Candidates for Teaching Positions." 


Candidate не ЩН N zx 2m 
scores 

i 92 9 19 6 5 55 708 

2 10 Чоо Ө 5 47 473 

3 465 5 8 5 928 100 

4 4 5 11 9 14 10 10 7 63 6% 

8 17 8 19 4 52 778 

6 D 96 49-9 4 27 239 

Toran 80 272 2098 

T? 


К (arr 
T = g = 2466.13 


Total sum of squares = 2998 — 2466.13 = 531.87 
Sum of squares between means = (55)*/5 + (47)*/5 + (28):/5 + (63)2/7 


+ (52)2/4 + (27)?/4 — T = 2628.85 — 2466.13 = 162.72 
Sum of squares within groups = 531.87 — 162.72 = 369.15 
* Data from Harrington. 


TABLE 9.5 Analysis of Variance of Scores on Length of Recommendations for 
6 Successful Candidates for Teaching Positions 


Source of Sum of Degrees of Mean Р F 
^ud 98 
variation squares freedom square 

Total 531.87 29 

Between means 162.72 5 32.54 212 2.62 
Within groups 369.15 24 15.38 


Since the F obtained in Table 9.5 is less than Fg the hypothesis 
that the means of recommendation scores do not differ among the 


6 successful candidates may be accepted. 


214 - Analysis of Variance 


Recommendation scores for 14 unsuecessful candidates are 
shown in Table 9.6. The student should complete the analysis 
and should verify that F = 1.31 and that Р. = 1.9. Hence the 
hypothesis that the mean length of recommendations does not dif- 
fer among unsuccessful candidates may be accepted. 


TABLE 9.6 Scores on Length of Recommendations Written for 14 Unsuccessful 
Candidates for Teaching Positions 


ee іа ва eel 
Candidate Recommendation scores N ХХ ZX: 
1 12 5 20 11 24 
2 (TOT 14 
3 9 3 6 620 4 8 6 
4 28 10 7 8 138 6 
5 S. T Ay 
6 25-8 ВТ 
7 W 14 7 10 6 1b 6 9 
8 415 4 15 8 
9 ih 805 100 7 59:584 6 
10 6 9 10 4 
11 5 4 6 2 9 15 
12 24 20 9 9 
13 8/517. Ter 95 10 
14 4 522 7 
Total 77 723 9009 


The information just obtained about successful and unsuccess- 
ful candidates may be used to advantage in pursuing one of the 
purposes of Harrington’s study, namely to determine whether 
successful candidates differ from unsuccessful in mean length of 
recommendation. Since we accept the hypothesis that the means 
of candidates within the groups are equal all recommendations of 
each group can be regarded as coming from a single population. 
Hence the problem of testing the hypothesis that the mean recom- 
mendation length of successful candidates is equal to the mean 
recommendation length of unsuccessful may be reduced to a prob- 
lem in analysis of variance with two populations. The data may 
now be treated as consisting of two subsamples, one of 30 and one 
of 77 recommendations. The calculation follows: 


T? _ (272 + 723)? _ (995)? 


HERES — лот °з? 


Total sum of squares = (2998 + 9009) — 9252.57 = 2754.43 


Computation Procedures - 215 


Sum of squares between means 
= (272)2/30 + (723)*/77 — 9252.57 = 2.25 


Sum of squares within groups = 2754.43 — 2.25 = 2752.18 


As we are now testing the hypothesis that the means of two 
populations are equal, the ¢ test for comparing the means of two 
independent samples might have been used. The value of t 
obtained by Formula (7.23), page 156, would be V.086 = .29 since 
1 = VF when F has 1 degree of freedom in the numerator. 

The calculation of F is usually simpler than that for obtaining 
1 directly. However, the F table shows values for Fs and P» 
only while the table of ¢ gives a greater range of values. It isa 
satisfactory procedure when comparing the means of two inde- 
pendent samples, to compute F, take its square root, and refer 
the result to the table of "Student's" distribution. 


TABLE 9.7 Analysis of Variance for Comparing Length of Recommendations 
of 6 Successful and 14 Unsuccessful Candidates 


-E шр 
Source of Sumof Degreesof Mean 
a Е Гы 
variation squares freedom square 
Total 2754.43 106 
Between means 2.25 1 2.25 .086 3.94 
Within groups 2752.18 105 26.21 


The observed value of F = .086 is so small that one is justified 
in questioning whether the selection of candidates was really 
random. As a matter of fact it was not. Harrington supplied 
the figures to illustrate his procedures and at that time there was 
no intention to use them for any purpose other than to examine 
his methodology. 

It is of interest, however, that even for his entire group Har- 
rington did not find mean length of recommendation significantly 
different for persons who secured positions and for their unsuccess- 
ful competitors. By a process too long to be described here, he 
obtained a quality score for each recommendation. These quality 
scores differentiated the candidates whereas length had not done 
so. For variation between 183 candidates the mean square was 
76.48/182 = .420. For variation of 1035 recommendation scores 
around the candidate means the mean square was 

208.70/(1035 — 183) = .245. 


Therefore F = .420/.245 = 1.71. With 182 degrees of freedom for 
the numerator and 852 for the denominator, P = 1.28. The 


216 - Analysis of Variance 


candidates cannot be treated as having the same means of quality 
Scores. 

Comparison of Means of Several Measures of the Same 
Individuals. Let us return now to the problem of the effect of 
varying the order with which ranges of different length are used 
in lessons of archery. When this problem was discussed at the 
beginning of this chapter three independent groups were used, each 
group shooting at the 50-yard range in a different order from the 
other two. An objection to that form of experimental design is 
that the sample differences were based not only on order of shoot- 
ing, but also on differences in skill among the groups, so that 
difference between orders is confounded with difference between 
groups. 

One way of eliminating group differences from the design is to 
compare the scores on the same group of subjects. For simplicity 
in illustration we will present scores for 11 subjects even though 
data are available for all 33 subjects. For all 11 subjects scores 
made at the 50-yard range when this range was shot first, second 
and third are presented in Table 9.8. 


TABLE 9.8 Scores of 11 Archery Students at the 50-yard Range for Varying 
Order of Shooting* 


First Second Third 


Subject order order order pum шеш 
1 114 182 128 419 139.67 
2 99 121 119 389 118 
Б] 51 100 85 186 62 
4 78 144 72 294 98 
5 134 125 201 460 153.33 
6 71 38 49 158 52.67 
7 66 67 107 240 80 
8 33 89 51 173 57.67 
9 146 159 157 462 154 

10 80 101 83 264 88 
1i 105 113 113 331 110.33 
Sum 977 1239 1110 3326 


Mean 88.82 112.64 100.91 100.79 
ТЭСО аА И TLE 
* Data from Schroeder.’ 
It is important to notice the difference between the scores in 
Table 9.1 and those in Table 9.8. In the former the scores in 


any column can be rearranged in any arbitrary manner without 
changing the meaning of the data. In the latter any interchange 


Several Measures of Same Individuals - 217 


of scores in a column must be accompanied by a corresponding 
interchange in every other column since the three scores in any 
row are linked by the fact that they belong to a given person. 
This difference plays an important part in determining the sta- 
tiscal model which is to be used in the analysis of the data. 

We shall suppose that each observation in Table 9.8 is derived 
from a normal population with a mean which depends both on 
the order of shooting the range and the person doing the shooting. 
Consider all the possible scores which might be made by the ith 
person shooting this range in the jth order to be a population and 
let the mean of that population be called ш. The single score 
recorded for that person at that order is a random deviate from 
шу. Thus each cell in Table 9.8 has its own population mean and 
each of the 33 scores recorded there is a random deviate from 
the population mean of the cell in which it stands. As both order 
and person may influence the mean, it is conceivable that the 
33 observations in the table come from 33 different means. These 
33 possible population means, together with certain row and 
column averages are displayed in Table 9.9. In this table 


ш. = $ (ша + pas + шз) 
and in general ш. = (ua + pa + и). Similarly 


n 
шл = dg + Mer ptt + Ип, 1) and из — й Хе р 
= 


The total mean и is a mean of the 33-cell means, also of the 11-row 
means, and also of the 3-column means. 


1 3 1 3 
u-35) Уш = ж У. =$ Ув, 
$=17=1 i=1 j=1 
or in general, if there are r rows and с columns 
1 Een ЖИ 
(9.16) p=- ни У. су 


The row and column means are expressive of the physical 
aspects of the problem. The mean Hi. is the skill of the ИВ per- 
son regardless of order of shooting. The mean p.; is the mean of jth 
order regardless of subject. The mean 4 is the average of all 
effects. The difference ш; — u may be regarded as the person 
component or the deviation of the skill of the ith person from 
the skill of all 11 persons. Similarly the difference u.; — м is the 
order component. 


218 - Analysis of Variance = 


TABLE 9.9 Population Means of Scores in Table 9.8 


5 Order Mean of 
Subject First Second Third row means 
1 pn pz md ш. 

2 un Bee Hes pa. 
3 Ha Из? n Bs. 
4 m pao Шз pa. 
5 not m Bsa Ш 
6 Hot Hez Hes Ш, 
7 Ba n Шз ш. 
8 psi no n: n 
9 Ha Hoe Hos Ms 
10 Шо, 1 Шо, 2 Hio, 3 Hao. 
11 pn, 1 Mu, 2 Mu, з p 
Mean of 
Ba Ll Hs и 


column means 


To obtain information about the 33 unknown parameters of 
Table 9.9 from the 33 observations of Table 9.8 is a hopeless task 
unless the statistical model can be simplified in some way so that 
the number of parameters is smaller than the number of observa- 
tions. We must make some assumption that will reduce the num- 
ber of unknown parameters. An assumption useful for this purpose 
is that each cell mean му; is the sum of the general mean, и, the 
component for the ith person, and the component for the jth 
order, that is 


(9.17) — ng = B+ (ui. — u) + (n. H) = hi. + Mj- и. 

This assumption allows us to describe the 33 population means as 
in Table 9.10, which presents the statistical model that will be 
used in the analysis of the archery data in Table 9.8. 

The reader may verify that the entries in the right-hand col- 
umn are means of the entries in the corresponding rows and that 
the entries in the lowest row are means of the column entries. 
Another point to note in the table is that although 15 parameters 
appear in Table 9.10 only 13 of these are independent. One re- 
striction is introduced by the fact that и is the mean of the 11 row 
means or the mean of the 3-column means. Another restriction 
is introduced by the fact that the mean of the row means equals 
the mean of the column means. Since we have now only 13 param- 
eters instead of 33 the problem is manageable. 

To complete the statistical model we shall assume that each 


* Comparison of Means of Several Measures · 219 


TABLE 9.10 Revised Population Means of Scores in Table 9.8 


Subject I Order Mean of 
First Second Third row means 
1 Шш. Ba TB ш. + hae ЬШ ш. Б шз — Ш ш. 
2 ш. ша Ы p. H pa — ш Be. + Bs — H ш. 
3 из. + ha — H дз. + Ba — h из. + a hH Ms, 
4 pa d pa H pud he Н pa. + a — И ш. 
5 ps. йл — Ж Hs. + 2 — и Ms. + Bs — Ш из. 
6 ps. + Ba — И ив. + Bo — H Hs, + Шз — Ш Hs, 
7 ш. + a Ш ш. из – Ы p.c Шз — Ш ш. 
8 us. + Ha Ш us. Ш — Ш us. + ua — ИШ ps. 
9 ш. Hui Ш ш. d pa — Ы ш. Ra — H ш. 
10 шо. + ha — H шо. + ps — Ш шо. + Hs — Ш Шо. 
11 Bu. + Ba — Ш ш. + Bo Ш Bu. + из — Ш Bu. 
Mean of 
Ma Т Шз ГА 


column means 
соиши A a a a a —————- 


of the 33 populations is normal, with mean as indicated by its 
position in Table 9.10, and that all of these populations have 
the same variance o°. 

The observations and their row and column means may be 
designated by symbols as in Table 9.11. Thus each cell of the 
table has an observed value Хуу and an unknown population mean 


TABLE 9.11 Symbolism for Observations and Their Means for 3 Scores on 
Each of 11 Subjects 


Subiect Order Mean Sum for 
pate First Second Third for row row 
1 Xu X» Xs X. Ti, 
2 Ха Xn Xn 42. Ts. 
3 Xn Xn Хз X; Ts. 
4 Xa Xo Ха Xa, T4, 
5 Хи Хе Хи As. ^s. 
6 Xa Xo Xa xe T^. 
7 Xn Xn Xn X; Ts. 
8 Xs Хь Хь Xs. Ts. 
9 Хи Хь Xn Ap. T^. 
10 X, 1 X, 2 Хо, 3 Xv. Tio 
11 Хи, 1 Хи, 2 Хи 3 Xu. Tu 
Mean of 
column Xa X. Xs Х 
Sum of т, т, т, т 


220. Analysis of Variance 


шу = ш. + и; — p. Each row has an observed mean Х.. and an un- 
known population mean и;.. Each column has ап observed mean 
Х.; апі an unknown population mean д. The table as a whole 
has an observed mean X and an unknown population mean p. 

The sample estimate for mi; is Ху. + X.; — X. The expression 

X; — шу is a random deviate. Therefore 

Xg- ое ере) = Xi SOK Se uen 
is the best sample estimate of the error in X,;. Expressions for 
sample variances which will be needed in testing hypotheses will 
now be given. 

Estimate of c? Based on Error. This estimate of o is obtained 
from the sample estimate of the error in Хуу. It measures the 
effect of random deviation of observed scores Жу; from values ex- 
pected on the basis of row and column means. If each X;; was 
exactly equal to иг; this error variance would be zero. We shall 
denote this estimate by the letter E. If wij=mi.+u.s—u, Eis 
an unbiased estimate of o? regardless of the values of the row and 
column means in the population. 

In general, if there are r rows and c columns, 


м УХ» ERa Xit Ху 
(т — 1)(c — 1) 


(9.18) Е == 
It is more easily computed by equivalent formula 
(9.19) 


T с E 1 с T Te 1 r с " 1 T с 2 
i! ààYeGEQxe-:l (УХ О, EX) 
(r-1)(-1) 


or 
1 1 1 
Хх, – = У А MG К ape 
(т — 1)(c — 1) 


Estimate of c? Obtained from Variation between Row Means. 
In this problem, each row mean is the average of 3 scores made 
by one of the 11 persons shooting at the target. The variance 


Y. -Xy 


among these 11 means is given by the expression {1 10 , 


(9.20) Е- 


Estimate of o? from Variation between Rows - 221 


m 
2 C = y) 


4 2 
which is an estimate of S + O - Here " is the variance 


of means of 3 scores if these scores are chosen at random from the 


SN =p} 


same population, with mean и and variance c?, while xam 


is a component due to variation among population means of rows. 
The hypothesis ра. = p. = ^: = Ш. =H would in this case be the 
hypothesis that the 11 archers have equal skill. Under that hy- 


11 
pothesis this second component is zero and so ту » (X, — X) is 
i=l 
2 
an unbiased estimate of су? = е Therefore under this hypothesis 
an unbiased estimate of c? is provided by 
11 
& XO. - X 


This is the mean square among row means. For convenience we 
shall denote it В. In general, if there are г rows and с columns 
and one score in each cell, an unbiased estimate of c? is provided by 


eX X. -XF 
(9.21) В = 2—7 
if the hypothesis ш = pe =, = Ur = p ds true. 


А more convenient routine for computing В is 


1 Т? 
ОТ ү 
r-1 


The numerator of (9.21) and of (9.22) is the sum of squares for 
rows and the denominator is the corresponding degrees of freedom. 

Estimate of o? Obtained from Variation between Column 
Means. In this problem a column mean is the average score 
made by all 11 archers on one particular order of shooting. The 
variance among the three-column means is 


(9.22) ne 


3 
» aso 


1% а | c 
320 j — Ж), which is an estimate of тт + 2 


222 Analysis of Variance 


2 
Here = is the variance of means of 11 scores if these are taken at 


3 
random, while 4 ў: (u.; — ш)? is а component due to variation among 
j=l 
population means of columns. Under the hypothesis that the 
three column means are equal (ил = шз = из = и) this component 


3 
is zero and so $ Y (X ; — X)?is an unbiased estimate of ox? = o?/11. 
j=1 


2 
Therefore under this hypothesis an unbiased estimate of c? is 
provided by 
3 
А 20) 

ј=1 
This is the mean square among column means which will be de- 
noted C. In general, for r rows and с columns, 


r$ Xs- Xy 


(9.28) C= Ат 


A more convenient routine for computing C is 


(9.24) оаа 


The numerator of (9.23) and of (9.24) is the sum of squares for 
columns and the denominator is the corresponding degrees of 
freedom. 

Hypotheses and Procedure for Testing Them. An important 
characteristic of the sample mean squares E, В, and C is that 
under the assumptions of normality and equality of variance in 
the population, these three statistics are independently distributed. 
Hence the F test may be used to test hypotheses. 


Usually two separate hypotheses are formulated, the hypo- 
thesis concerning row means: 


H, : ил = ш. = +++ = р = и 
and the hypothesis concerning column means: 
Н.:ші= ра= = ц = р 


These hypotheses may be tested separately and one may be re- 
jected without the other. 


Hypotheses and Procedure for Testing Them - 223 


To test hypothesis H, 
that row means are equal, the ratio F, = R/E is calculated. 
To test hypothesis H. 
that column means are equal, the ratio F. = C/E is calculated. 


It is instructive to carry out the calculations indicated by 
Formulas (9.18), (9.21), and (9.23) directly, although far simpler 
procedures are available and will be described below. Students 
do not always correctly understand the meaning of the error term 
E so Table 9.12 has been set up to list in detail the numbers (for 
the data of Table 9.8) whose squares are added to obtain the error 
sum of squares which is the numerator of E. Each entry in Table 
9.12 is a residual, the amount by which the observed value Xj; 
differs from the estimate of that value based on row mean, column 
mean and total mean. 

(— 13.70)? + (80.48)? + - + (2.55)? _ 12530.909 _ 
= (10) @) = 20 = 626.55. 
R = 335{(139.67 — 100.79)? + (118 — 100.79)? 
+... (110.33 — 100.79)?} = 4087.4 


C = 33| (88.82 — 100.79)? + (112.64 — 100.79)? 
+ (100.91 — 100.79)?] = 1560.5 


E 


TABLE 9.12 Value of Ху — (X; + X; — X) for each Entry in Table 9.8 
First Second "Third 


ы order order order pum 
1 — 13.70 30.48 — 16.79 — .01 

2 — 2.03 — 3.85 5.88 .00 

3 97 26.15 — 27.12 .00 

4 — 8.03 34.15 — 26.12 00 

5 — 7.36 — 40.18 47.55 01 

6 30.30 — 20.52 — 3.79 — 01 
if — 2.03 — 24.85 26.88 .00 

8 — 12.70 19.48 — 6.79 — .01 

9 3.97 — 6.85 2.88 .00 
10 3.97 1.15 — 5.12 .00 
11 6.64 — 9.18 2.55 01 
Sum .00 — 02 OL — 01 


eS __ EET 
TT Е = Еа — 6.52. The error mean square is always 


placed in the denominator and the critical region is in the right 
tail of the F curve. For m = 10 and m, = 20, F.» = 3.37. The 


224 - Analysis of Variance 


observed value 6.52 clearly falls in the region of rejection for 
о = .01, and consequently the hypothesis that individuals do not 
differ in skill must be rejected. 


p,- © _ 1560.5 
°= B 626.55 


The observed value 2.49 falls in the region of acceptance with 
a = .05, hence there is no reason to reject the hypothesis that per- 
formance is independent of order of shooting, no reason to suppose 
that higher scores are to be expected when the 50-yard range is 
shot in one order than another. 

Additive Nature of Component Sums of Squares. For each 
Xi; the following identity holds 


Xj-X-(X.-X)-(X;-X)o-(X5- X.- X; X) 
When this identity is squared we get 
(X,; — X) = (Х, – X; + (X-X) + (Х.-Х.-Х.,+Х) 
+(X; - ХХ.,+Х)+.-Х(Х,-Х.-Х.+Х) 
+(Х..-Х)(Х,-Х.-Х.,+Х)] 
When the sum is taken over all rows and all columns, the sum of 


each expression inside the square brackets is found to be identi- 
cally zero, so the cross product terms vanish, leaving 


025) $ Š (а-у 
à 51 (Е.-Х) ry Gey хх, 


= 2.49. Богт, = 2 and m = 20, Fs = 3.49. 


ј=1і=1 
sum of 
Total ae (9 sum of sum of 
Г 
ee) ene rt es уз 4 Squares 
Squares e or for 
TOWS columns error 


The three additive component sums of squares are independently 
distributed with r — 1 degrees of freedom among row means, c — 1 
degrees of freedom among column means, and (r- 1)(c— 1) 
degrees of freedom for error. These degrees of freedom together 
make up the rc — 1 degrees of freedom for the total sum of squares. 
When each of the component sums is divided by its degrees of 
freedom, the results are the mean squares R, C, and E. 
Formula (9.25) may be rewritten in equivalent form as 


Additive Nature of Component Sums of Squares - 225 


Т? (1 TA р 
(9.26) XXX'- >= e Ire- 2) + (; Y т? т) 
1 1 T 
Е: (>r ~ ey te A +9) 


where the quantities in parentheses are the numerators of For- 
mulas (9.22), (9.24), and (9.20) respectively. An easy way to 
obtain the sum of squares due to error is expressed schematically 
as follows: 


Error = Total — Rows — Columns 


Table 9.13 shows computations for obtaining the three com- 
ponent sums of squares by the routine of Formula (9.26) for the 


TABLE 9.13 Computation of Sums of Squares for Data of Table 9.8 


Subject Xa Ха Ха Ti. Tis » Xi? 
j=l 
1 114 182 123 419 175561 61249 
2 99 121 119 339 114921 38603 
3 51 100 35 186 34596 13826 
4 78 144 72 294 86436 32004 
5 134 125 201 460 211600 73982 
6 71 38 49 158 24964 8886 
7 66 67 107 240 57600 20294 
8 33 89 51 173 29929 11611 
9 146 159 157 462 213444 71246 
10 80 101 83 264 69696 23490 
11 105 118 118 331 109561 36563 
Tj 977 1239 1110 3326 
Tj 954529 1535121 1232100 
1 
Lx 98005 156281 136858 391754 
t=1 
T?/N = (3326)*/33 = 335220.5 ZZX;?- 391754 
ZT? _ 1128308 _ Т/М = 335220.5 
птер 3 = 5701021 Total sum of squares = 56533.5 
T?/N = 335220.5 40882.2 + 3120.4 = 44002.6 
Sum of squares for rows = 40882.2 Sum of squares for error = 12530.9 
DT? _ 8721750 _ 
il Үш. 338340.9 


T*/N = 335220.5 
Sum of squares for columns = 3120.4 


226 - Analysis of Variance 


archery scores of Table 9.8. The results of this computation are 
shown in Table 9.14. 


TABLE 9.14 Analysis of Variance of Data from Table 9.8 


Degrees 
Source of variation Eun of Мел Ке ohare Ks 
Squares freedom 89816 
Total group 56533.5 32 
Between means of individuals 40882.2 10 4088.22 6.53 3.37 
Between order means 3120.4 2 1560.2 2.49 3.49 
Error 12530.9 20 626.5 


In general the analysis of variance computations may be sum- 
marized as in Table 9.15. 


TABLE 9.15 General Summary of Analysis of Variance Components for Data 
with г Rows and с Columns and One Score in Each Cell 


Source of 


Lae A Sum of squares d. f. Expectation 
Т? 
Total 2DX3;; – = ro— 1 
N 
lv Т? cD (ui.— u) 
B 1 ada I Hi. 
etween row means 2 Т; N r—1 o+ Ani 
Between column 1 T: тУ(и.;= ш)? 
means TRU аст Ges es 
^ 1 1 m 
Error ХЕХ O ET P+ N. (r—1)(c—1) o 


: Computations When Original Scores are not Available. Some- 
times one wishes to apply the test described on pages 216 to 226 
to published data for which the original scores are not available. 
It would then be impossible to use any of the formulas which 
have been presented in this chapter, but if for each subgroup the 
computed mean and variance or standard deviation are given and 
the number of cases, the formulas can be adapted as follows: 


у МХ; 
(9.27) Mean of total group, Х = B = EN 


А 
(9.24) Sum of squares within groups, У (N; - 0s? 
j=1 


(9.29) Sum of squares among means, 


Factors Influencing F - 227 
k 
SNO; – Xy = ZNX? - NY: 
j=l 


A difficulty often encountered in trying to use published data in 
this way is that the original computer may not have retained 
enough digits to have any significant digits in the sum of squares 
among means. Another difficulty is that one cannot always be 
sure whether N or N — 1 was used as denominator in the original 
computation of the variance. 

Factors Influencing F. In order to assist the student in rec- 
ognizing the general kinds of situations in which the mean square 
for means is typically less than, equal to, or greater than the mean 
square for individuals within groups let us consider a hypothetical 
situation. Imagine that 500 names have been taken at random 
from the list of registered voters in New York City, and that a 
card has been made out for each of the 500 persons, containing 
such information as age, sex, marital status, number of children, 
occupation, number of years of schooling completed, height, 
weight, index of physical strength, party ticket voted in preceding 
presidential election, score on a mental test, score on a test of 
publie affairs, voting precinct. Obviously such data would be 
difficult to obtain, but they are real enough to the imagination to 
serve as a good illustration. 

A. Now let the 500 cards be thoroughly shuffled and dealt into 
20 random piles of 25 each. For the mental test scores, let the 
mean square among groups and the mean square within groups 
be computed and the F ratio obtained and recorded. Then let 
the cards be shuffled again and again dealt into 20 random piles 
of 25 cards each and F ratio computed. Let this be done many 
times, Sometimes the mean square between groups will be larger 
and sometimes smaller than the mean square within groups. The 
distribution of the obtained values of F will approximate a smooth 
F curve with n, = 19 and n; = 480. 

If instead of score on mental test, score on test of public affairs 
had been used, both mean squares might have been very different 
from those obtained in A but the theoretical F distribution would 
be the same because its form depends upon the number of de- 
grees of freedom and not upon the nature of the physical variate 
studied. 

Suppose the same procedure were carried through with the 
data on income, or on number of years of schooling completed. 
These variates probably have a very skewed distribution and the 


228 - Analysis of Variance 


assumption of normality in the trait measured, which is basic 
to the F test, would not be met. Certain other traits named such 
as index of physical strength, height, or weight, might have a 
bimodal distribution if both men and women are in the group 
studied. However, if the distribution of the trait does not depart 
too radically from the normal, the distribution of the variance 
ratio will resemble the F distribution under the experimental con- 
ditions described above. 

B. Now let the same 500 cards be sorted on the basis of voting 
precinct, and let us suppose 20 precincts are represented. For 
several of the variates we have mentioned, a group of persons 
living in the same neighborhood and consequently registered in 
the same precinct would be more homogeneous than a group ob- 
tained by random selection from the entire city. Hence in this 
situation the mean square within groups is almost certain to be 
reduced and that between groups increased in comparison with 
situation A. Consider income or years of schooling completed. 
One feels a rather strong conviction that group means will vary 
more from precinct to precinct than in A and that individual 
scores will vary less from person to person in the same precinct 
than in A. Consequently, the numerator of the variance ratio 
would tend to be larger than the denominator, though not neces- 
sarily larger in every sample. Also, if the cards were sorted on 
the basis of occupation or sex, the income data might be expected 
to show a mean square between groups in general larger than that 
within groups, and therefore the variance ratio would not conform 
to the F distribution but would have fewer small values of F and 
more large values than would be expected under situation A. 
пое the various groups аге in reality samples of different popula- 

ions. 

C. Now let the same 500 cards be sorted as before by pre- 
cincts, but instead of keeping the cards from a given precinct 
together we shall distribute them so that each group receives 
approximately 1/20 of the cards from each precinct. Now if the 
mean precinct score on any given trait such as income varies 
greatly from precinet to precinct, this new sorting will have the 
effect of making the mean square between means in general less 
than in either A or B. If the mean square between means is re- 
duced that within groups must be increased for the total mean 
square has not been affected by the manner of sorting the cards. 
Therefore in such a ease F would tend to be smaller than 1. 


References · 229 


EXERCISE 9.2 

1. In each of the following situations state the number of degrees of 
freedom with which the F table should be entered and the corresponding 
value of Pss. 

a. A comparison of the means of 5 groups in which N; = 12, N: = 30, 
Мз = 6, № = 13, № = 8. 

b. A comparison of the means of two groups each having 25 cases. 

с. There are 5 test scores for each of 20 subjects. We want to test the 
hypothesis that the scores do not distinguish the subjects, that all subjects 
have the same mean. 

d. There are 5 test scores on each of 20 subjects. We want to test 
the hypothesis that the tests are equally difficult, that they all have the 
same mean. 

2. Complete the analysis of the data in Table 9.6. 

3. In a study? of the amount of community information possessed 
by 79 elementary school teachers in St. Louis, Harris constructed a social 
information test and administered it to teachers in four schools. These 
schools were located in districts that differed markedly as to socio-economic 
status, Jefferson being at one extreme and Banneker at the other. How- 
ever, if the differences among the means are no greater than might readily 
occur by chance, it would be inappropriate to seek for any other explana- 
tion of their origin. From the data reported on page 192, test the hypothesis 
that ш = ро = pa = pe. 

REFERENCES 

1. Harrington, Wells, Recommendation Quality and Placement Success, Psycho- 

logical Monograph, 55 (1943). 

2. Harris, Ruth M., Teachers’ Social Knowledge and Its Relation to Pupils’ Responses, 
New York, 1941, Teachers College, Columbia University, Bureau of Publica- 
tions. 

. Schroeder, Elinor M., On Measurement of Motor Skills, New York, 1945, King’s 
Crown Press. 


e 


JU, Calcutta 


d> P, 


10 Linear Regression and 


Correlation 


In this chapter, methods of statistical inference will be 
applied to bivariate data, that is to data in which each individual 
receives a score on (wo variables, two scaled traits. In Chapter 13 
the work of the present chapter will be extended to situations in 
which each individual receives a score on more than two variables 
(multivariate analysis). In Chapter 11 there will be a discussion 
of methods of measuring relationship between two traits when 
for some reason the methods of the current chapter are not 
applicable. 

Although many readers of this chapter will have studied re- 
gression and correlation in an earlier course, a brief review of the 
principles, formulas and computing procedures may be helpful to 
some. Such a review will be presented in the first part of this 
chapter. 

Linear Regression. Regression is the estimation, or predic- 
tion, of unknown values of one variable from known values of 
another variable. An example of such use of regression is the 
prediction of grades at the end of a course from grades in a prog- 
nostic test given prior to the beginning of the course. 

Let X be a variable for which the value is known for each 
individual in a study. This is commonly called the independent 
variable. Let Y be a variable for which the value is to be es- 
timated for some individuals. This is commonly called the 
dependent variable. Let X, and Y, be scores for the ath in- 
dividual. The reader should note that this is a new use of the 
letter a, differing from its use to designate the size of a region of 
rejection. The letter о will be used in both senses from this point 
on, but its meaning will be readily apparent from context. From 
this point on, many problems will involve summing over rows, over 
columns, and also over individuals within cells, and formulas may 
require three, or sometimes more, subscripts. The use of English 
letters to denote the categories and the Greek letter a to denote 


Linear Regression - 231 


the individual within the category makes the meaning of a formula 
more obvious. Thus ух ija can be understood at once as the sum 


for all individuals in the ijth class, while УХ might mean 
k 


different things in different problems and would require a specific 
definition to state whether all three letters indicated categories or 
one of them referred to individuals. Let Y be the estimate of an 
individual's unknown Y from his known X. It is called “тевтез- 
sion Y," “estimated Y," “predicted Y," “tilde Y," and some- 
timed ‘“curlicue Y." The meaning of Y, as an observed value 
and Y, as the best linear estimate based on knowledge of X, 
must never be confused. (For an account of the origin of the term 
"regression" see reference 11.) 

The relation between Ў and X is called the regression equation 
and, for linear regression, is given by the formula 
(10.1) f, =Y + by(Xa— X) 
In this formula Y, and X, vary from individual to individual, 
while Y, bj, and X are constant for all individuals to whom the 
formula is applied. 

The statistic bj, may be expressed in terms of deviations from 
the means as 


N 
Y Xa- Х)(Ү, – Y) 
(10.2) bye = ay 
у (х, - X)? 


а=1 


The computing routine suggested by this and Formulas (10.3) 
to (10.5) would be unnecessarily laborious and would incur large 
rounding errors. Equivalent formulas that are more convenient 
to use will be found in later sections where machine computation 
and computation from a scatter diagram are treated. There is a 
very large number of equivalent formulas which are useful to know. 
The efficient computer understands the algebraic relations among 
them and makes use of whichever one appears to offer the simplest 
routine for the data in hand. In order to keep the development 
simple, only the most necessary formulas are presented in the 
chapter but a longer list is given in an Appendix to the chapter. 

The reason for heading this section “Linear regression" is 
that the graph of Formula (10.1) is a straight line. This formula 
is appropriate to use when the mean value of Y for each given value 


232 - Linear Regression and Correlation 


of X lies near a straight line. If these means depart considerably 
from the straight regression line, a curvilinear regression line is 
more appropriate. Curvilinear regression will not be taken up 
here. It is treated in Chapter 23 of reference 1, in Chapter 6 of 
reference 5, and in Chapter 12 of reference 8. 

If a graph of Formula (10.1) were plotted on a chart of the XY 
distribution the constant byz would represent the slope of the line, 
and the line would pass through the point (X, Y). The reader 
who does not understand the relation between an equation and its 
graph will find Chapter 10 of reference 10 helpful. 

The difference Y, — Y, between an individual's observed Y 
score and the regression estimate for that score is called a residual 
or residual error. The sum of the squares of the residuals for all 
N individuals is 
(103) z-fr-zq-vy»y-BO-30 РТ 

Z(X- Xy 

This sum has N — 2 degrees of freedom because there are N 
observations from which two constants, the mean of Y and the 
slope of the regression line, have been obtained. 

The equation in Formula (10.3) may be read as follows: The 
sum of squares of deviations of individual Y scores from the re- 
gression estimates corresponding to these scores is less than the 
sum of squares of the deviations from the mean, Y, for all these 
scores by the quantity 

[zx - Xy)y - Үр 
E(X- Xy 

If only Y scores are available, Y is the best estimate of an un- 
known Y. For a given sample size the quantity Z(Y — У)? is a 
measure of the precision of estimate obtained from Y. If an X 
score is known for an individual, but the Y score is not, then 19 
calculated from the known X, is the best estimate of Y. For a 
given sample size the precision of an estimate of Y obtained from 
a known X is given by Z(Y — Ӯ). The quantity 

[5(Х - Xy) – Рр 
E(X- Xy 
measures the improvement in precision of estimating Y obtained 
by using X over the precision in not using X. Clearly, the larger 


this quantity is, the greater is the improvement in estimating Y by 
using X. 


Machine Computation · 233 


The Correlation Coefficient. We may write Formula (10.3) as 


(10.4) z(Yy-Yy-Q-r)zY-Y» 
if we define r by the formula 
(10.5) zx = D- 


У — Xyx( -Yy 


Tt becomes obvious that the value of r measures the gain in pre- 
cision of estimate of an unknown Y from a known X. If r is zero, 
Z(Y —Y)? is the same аз Z(Y — У): and there is no gain in pre- 
cision. If ris near + 1 or near — 1 the gain in precision is great, 
for then Z(Y — Y)? is much less than Z(Y — У). 

The quantity т defined in Formula (10.5) is called the coefficient 
of linear correlation. The word ‘‘linear” is included because of its 
relationship to straight line regression. This coefficient is often. 
called the product moment coefficient because the quantity 
Z(X- X)(Y – Y) ва product moment. It is also called Pearson 
r because of the important contributions of Karl Pearson." 

In some problems interest centers on the estimation of scores 
by means of a regression equation; other problems are concerned 
chiefly with the mutual interrelationship between two variables 
as measured by the coefficient of correlation. In some problems 
both regression equations 


ӯ-Ү+,(Х -Х) 
and XK =X +d,(¥ - У) 


are of interest, in others only one. Thus there would be practical 
value in predicting college achievement from a prognostic test 
but little value in estimating prognostic test scores from measures 
of college achievement. In a regression equation the relationship 
is from known scores to estimated scores and is not reversible. In 
correlation the relationship is mutual, and 


(10.6) fey = УВЫ 


The order of letters in the subscript for 7 is immaterial, Tsy = Ту. 
For b the order is important and in general bzy # би. 

Machine Computation of b,, and Ty Without the Use of a 
Scatter Diagram. To illustrate the way in which data may be 
organized so that adequate checks are obtained when a computing 
machine is used, a problem will be set up using а relatively small 


234 - Linear Regression and Correlation 


number of cases. The members of a class in first term statistics 
were given several prognostic tests at the first meeting of the 
course. One of these, which is here called X, was a specially con- 
structed test in artificial language. The criterion variable Y, is 
the semester grade in the course. The X and Y scores for a part 
of the class are listed in Table 10.1, the number of cases having 
been reduced because it does not seem likely that readers of this 
text will learn any more from working with a long series of data 
than with a short one. The X — Y and X + У columns are 
utilized for checks. 

In machine work there is no need to keep numbers small and 
therefore gross scores (X and Y) are used instead of deviations 
(X-X and Y -Y or X А, and Y - Aj. Less work is in- 
volved and greater accuracy secured if division and square root, 
which usually involve rounding error, are performed as late in 
the process as possible. The following formulas for by, and rz, 
meet these criteria and are equivalent to Formulas (10.2) and 
(10.5) : 

_ NZXY - (2X)(2Y) 
(10.7) bye = — xy: EX (SX)? 


D NZXY - (ZX)(ZY) 
VENZX? - (ZX)!*]LNZY? - (ZY)?] 


On certain machines it is possible to enter X, Y, X?, Y? and XY 
in a single setting of the machine, and thus to obtain ZX, ZY, 
УХ", EY? and УХУ in one process (see К. Pease °). Even with 
a machine, some check on the correctness of the results is essential. 
One possible check is to cumulate the XY products as a separate 
operation. If the result agrees with ZXY previously obtained it 
is unlikely that there is an error in the sums found, but this check 
does not guarantee correctness. A foolproof check can be ob- 
tained from the sums and sums of squared entries in X — Y and 
X + У columns. If the five sums named above have already been 
found by the method described by Pease, use of X — Y column 
alone would provide а complete check. Any of these formulas may 
be used to check the sums: 


(10.8 and т 


(10.9) УХ + BY = 3(X+¥) 
(10.10) Sosy -2 ^7) 
(10.11) Z(X-Y)42z(X-Y)-922X 


(10.12) SOUCI UY oy 


Machine Computation - 235 


TABLE 10.1 Marks of Students in First Term Statistics Course (Y) and 
Scores on an Artificial Language Test (X) used To Predict Term Marks 
ee ЕИН 


Student x Y ЖК X4+Y 
1 52 62 = 10 114 
2 57 50 7 107 
3 56 57 zs 113 
4 30 32 =} 62 
5 47 55 aes 102 
6 49 43 6 92 
7 60 59 1 119 
8 56 57 EE 113 
9 53 55 mE 108 
10 55 56 - 1 111 
1 54 55 Ld 109 
12 53 56 ed 109 
13 58 57 1 115 
14 57 48 9 105 
15 45 36 9 81 
16 46 57 EX 103 
17 60 63 = 9 123 
18 37 47 EO 84 
19 24 43 =f) 67 
20 43 57 = 100 
21 47 57 = ow 104 
22 50 47 3 97 
23 48 7 1 95 
24 42 49 Ат 91 
25 59 51 8 110 
26 40 48 EXE: 88 
27 41 58 = ih 99 
28 47 63 — 16 110 
29 43 56 — 13 99 
30 57 50 7 107 
31 32 46 me 78 
32 53 43 10 96 
33 46 39 7 85 
34 54 47 7 101 
35 49 48 1 97 
36 58 55 3 113 
37 55 47 8 102 
38 14 35 = 49 
39 50 62 — 12 112 
40 41 59 ЕЕ 100 
41 14 29 = 15 43 
42 46 54 = 8 100 

Sum of scores 1978 2135 — 157 4113 

Sum of squares osaga 111399 3937 415325 

of scores 


Nl — 1 050 (2... eee мос шыш 


236 - Linear Regression and Correlation 


These formulas may be used to check sums of squares and sums of 


products: 
(10.13) D(X + Y)?+ 2(X – У) = 22X? + 25 У? 
(10.14) ZXY -i[Z(X Y) -Z(X - У)?] 


The checks of Formulas (10.9) to (10.13) all hold for the data 


of Table 10.1. Then we may compute ZXY by (10.14) as 
3[415325 — 3937] = 102847 
The following values have now been obtained: 


М = 42 УХУ = 102847 DX? = 98232 
DX = 1978 DY = 2135 DY? = 111399 
X = 47.10 Y = 50.83 


From these the following computations are made: 

Nay = NZXY - (ХХ) (ZY) = 42(102847) – (1978) (2135) = 
МУга? = NZX? — (2X)? = 42(98232) — (1978)? = 213260 
N2y* = NZY? — (ZY)? = 42(111399) — (2185)? = 120533 
96544 

213260 - 4159 
Led: 96544 

"^ 4/(213260)(120533) 


Y = 50.83 + .453 (X — 47.10) 
Y = .453X + 29.5 


bye = 


= .602 


96544 


To see the effect of the regression equation we may apply it 
to the X scores of the first 5 cases in Table 10.1 to see what Y 
score would have been estimated for them if that Y score had 


happened to be unknown. 


Student be Y Y yoy 
1 52 53.1 62 8.9 
2 57 55.3 50 — 5.3 
3 56 54.9 57 2.1 
4 30 43.1 32 — 111 
5 47 50.8 55 4.2 


In no case did the estimate of У based on X agree precisely with 
the obtained Y; the residual error У — Y was sometimes positive 
and sometimes negative. On the whole the deviation of Y from 
the regression value (Y — Y) seems to be smaller than its devia- 
tion from the mean (Y — Y). A method of measuring the efficiency 


of prediction will be discussed on page 244. 


Computation from a Scatter Diagram · 237 


Computation of bys and т, from a Scatter Diagram. Even 
when a machine is available many statisticians prefer to work 
from a scatter diagram in order that they may have a general 
impression of the linearity of the relationship. Those who prefer 
to work directly from the list of raw scores sometimes need to 
compute when no machine is available, and then they usually 
make use of a scatter diagram. Coding the scores reduces the 
size of the numbers involved and thus cuts down on labor. 

To code scores, an arbitrary origin is taken at the midpoint 
of some interval. This interval is usually either the lowest interval 
in the distribution so that all deviations will be positive or it is 
near the middle of the distribution so that deviations will be 
numerically small. This arbitrary origin will be called A, and 
when more than one variable is being studied it will have a sub- 
script as Az, Ay, or А, Then if X; and Y; are the midpoints of 
the class intervals, we may define the coded scores as 
(10.15) z = X;- 4. and yy = 4, 

Uz vy 

The actual process of coding is merely to label one interval 
as 0, successive intervals above it as 1, 2, 3... and successive 
intervals below it, if there are such, as — 1, — 2, — Soh sates 

The symbol f., will denote the frequency in a cell in the body 
of the scatter diagram, fz a marginal frequency in an interval of 
the horizontal variable, f, a marginal frequency in an interval of 
the vertical variable. 

The formulas for т and by: based upon coded scores are 

И i, Муу! — (а) (у) 
(10.16) bye = i x NZf.(z)? SN (Zf x) 
Мј! — (f) (Zh) 
VENZf (a^) — Gf) ГУЗ – fu] 

There are many different routines by which to compute the 
sums of scores, sums of squared scores, and sums of products 
which enter into Formulas (10.16) and (10.17). The method 
shown in Table 10.2 has the advantages of celerity, directness, 
and partial checks. The values 

М = >). = Zfy C = Хут’ 

А = Zfu' D = Zfaa'y 
are computed twice and so their accuracy is beyond doubt. The 
values of Zf,(y/)? and Zf.(z')* are not checked by this method. 


(10.17) and Tz = 


д wt Cae Az Ase 


& 7 


= сәр кош зы 
ү 
cio e фо с> ©: 00 бо р © (л л 
(EHEDG = GV — IN)" = AEN »S|IPaSpeRBboacar a 
(cócec)6 = GO — ЯЛ)? = ZN ge Ses 
(cer0D6 = (QV — ал)" = йл Sle ш жы ж 
xe е кы СИ 
€'66 + ХӨР" = X (508) * "T 
166 = DCN cee d о Е бочео-_ c 
(алет) (66260) A _ TU Gang Tt 
GSYOT I & к ©з RO I2 © d бсо за O сл } о № 
т ба, а D Rie Y 
5 СОТ 6 9048 ose 7195 ЖОЕ z ааа A ei c 
B om = 802.6. "c 
9 e qty 1 TENET 1 
= 
E e 0g о s y cc d 1 
s gees 05 о 6 e an т 
5 = М Y 
= 9098 = а = Arg сет 6 Gh вт $ g {ШУ 1 
3 ve = А =(Z ош 6 we м 9 в DAUE I|T T 
= =e Е 
8 Helen = (Де ott сс 960 8 : 7 a I 
© Е = y = fig B N SS NE 
= и т єт ш 6 а JAMANE 
5 ое ое o 0 о % I z 
£ os o S Я ПУ I АЕ 
rg т Ayf fit в 1 = 
E AA 8 ЦЧ ИЧ ^ Ч &gse"bkkhESBBENSEEEZ 
a шолбо 2424056 D Woy sjuer;gjeo? uoissejBey PUD uogp[e110) jo uoupjnduo)  z'0| 3]8vl 


A Mathematical Model for Linear Regression - 239 


For a large-scale enterprise, the authors recommend that com- 
putations be made by I B M machines. For a smaller study when 
a scatter diagram is to be used, they recommend using one of 
the various published charts which provides for a complete check 
at every step, such as the Durost-Walker Chart published by the 
World Book Company. 

Grouping scores into class intervals tends to produce a slight 
increase in variance and so a slight decrease in r. The error thus 
created is negligible if there are 12 or more intervals for each 
variable and N is large. The error is likely to be greater when 
N is small and when intervals are broad. In the present problem 
such grouping error accounts for the discrepancy between r = .602 
when computed directly from raw scores and r = .597 when com- 
puted from data grouped into class intervals. 

A Mathematical Model for Linear Regression. The mathe- 
matical model which is commonly used in dealing with regression 
estimates of Y is one in which the values of X are the same for 
all possible samples as they are for the observed sample. There- 
fore the population mean of X is X as this quantity is calculated 
from the sample of observations. The X’s are sometimes called 
known parameters. 

X = be 


The Y’s, however, are random variables. For each observed 
X, the population distribution of Y is normal with a mean which 
depends on X and a variance which is the same for each Х. The 
population mean of Y for a given X is written 
(10.18) Pye = My + В„(Х — X) 

The constants u, В and X are to be estimated from the sample. 
Of these X is fully determined from the sample as the mean of 
X’s, and this mean is not an estimate of a parameter but is in 
fact that parameter. The remaining constants are unknown 
parameters which are merely estimated from the sample. The 
estimate of u is Y = 2. The estimate of Byz is Б,., as defined 
in Formula (10.2). 

The variance of Y for a fixed X is 
(10.19) [E cy b p?) 
where с,” and p are the variance and the correlation coefficient in 
the population. The standard deviation of Y for a fixed X, 


(10.20) бу = ШЕ р; 


240 - Linear Regression and Correlation 


is commonly called the standard error of estimate of Y on X. It 
is the standard deviation in the population of the residual errors 
YS — By. 

The best sample estimate of c*,.. is 


1 N 
(10.21) am ERA - 7.) 
(10.22) -Na 


The symbol s,.2 had been used by statisticians a long time before 

they began to take the idea of degrees of freedom seriously, and 

had been defined as s, = s/Vl — r?. However s,’(1 — т") is & 

biased estimate of o,%(1 — р?) and multiplication by the fraction 

x = 1 corrects the bias. Unless N is small, x = ; 

that the traditional definition is nearly as good as the correct one. 
A convenient formula for computing зу. is 


{; Dr’ Dy? — (Yay)? 
(10.23) 8, = V У) 


Sampling Distributions of Statistics in Linear Regression. 
The distributions of the statisties У, b,,, and s*, must be known 
in order to test hypotheses or make interval estimates for the 
parameters му, бу, and c*,,. These distributions can be derived 
mathematically from the mathematical model for regression, The 
constant X being by assumption the same for all samples ва 
parameter which has no variability and no sampling distribution. 

1. The mean У. Imagine a total population from which re- 
peated samples of N cases are drawn by unrestricted random 
sampling. As was stated in Chapter 7, the distribution of 


is so near unity 


would then be normal with mean и and standard error A 
and the statistic 


X Y = by) VN 

8y 
would have "Student's" distribution. This is a familiar relation. 
However, the value of Y which occurs in the regression equation 
is the mean of Y values in a sample which has a particular fixed 
set of X values, and p, is the value of the mean of a subpopulation 
which includes individuals with those X values and no others. There- 


t 


Confidence Interval for u, Byn oyn and м,» 241 


fore it is reasonable that if Y is related to X the Y in the regres- 
sion equation would have a standard error smaller than o,/VN. 
The subscript z may be used, writing У, for the mean of a sample 
with fixed set of X values and yy for the corresponding popula- 
tion value. Then У, is normally distributed around и, with 
standard error 


(10.24) сӯ, = DN =o if 


да (ou “ж my) VN 


Sy.z 


(10.25) The statistic 


has “Student’s” distribution with М — 2 degrees of freedom. 
2. The regression coefficient bys The sample regression соећ- 
cient bys is normally distributed around fyz. The sample estimate 


of its standard error is 


8, 
10. = —— 
(19.26) TENNA 
(10.27) and t= be = Ber Ore ВНУ NTI 
85, Sy.z 


has “Student’s” distribution with N — 2 degrees of freedom. 
To test H : Byz = 0, the ( ratio reduces to 


В EE which is equal to 
к Xxyv/N -2 
Уау — (Фу) 
3. The standard error of estimate $,,, The statistic 
(10.30) (N cre zs fi ee (Zzy)*] 
has the chi-square distribution with N — 2 degrees of freedom. 


Confidence Interval for py, Ву» yx and рух. Interval esti- 
mates can readily be developed from Formulas (10.25), ( 10.27), and 
(10.30). If 1— æ is the confidence coefficient, these estimates are 


(10.28) t= 


(10.29) 


Sys LOIR 
(10.31) Y.+ lja < ш< Ys lioe 


VN 
8y.2 Sy. Lr 
(1032) bye + и lja € Bye < вы + VNI 1-j« 


242 - Linear Regression and Correlation 


2 = 2 РЭР 
ae ees ouo ct) 
Хе Хе 


\/1 (X - X) 

(10.34) Ў. + ss. N N-D 
a ү Bie хх 
Y. 1—0 МҮ isi 


For each value of X the confidence interval for 
Byz = By + В.Х — X) 
is a pair of points given by formula (10.34). These points lie on 
two curved lines, one of which is above and one below the sample 
line of regression. These eurved lines are closest to the sample 
line when X = X and depart further from it as X departs from X. 
The t-table is entered with N — 2 degrees of freedom. 
EXERCISE 10.1 
Use the data from Table 10.1 (p. 235) to answer questions 1 and 2. 
1. Test these hypotheses: (а) 8, = 0; (b) Byz = .6; (c) ш = 52. 
2. With а = .05 find an interval estimate (a) for бш, (b) for my, 
(c) for 07,2, (d) for ри. for an individual whose X score is 39. 
8. The following data are made artificially simple in order to enable 
you to make the necessary computations very easily and concen- 
trate your attention on the method. 


N = 27 Zz'y'-2 — 83 7 = 5 A, = 32.5 
Da’ = — 45 Zy' = 81 %=3 A, = 60 
I(x’)? = 219 Z(y? = 412 


(a) Find X and У 

(b) Find by, 

(c) Find the regression equation to predict Y from X 

(d) Find s, 

(e) With confidence coefficient .99, what is the interval estimate for 
(08. (2) и, (3) ШЕ 

(f) Test the hypothesis (1) 8j, = 0, (2) Bye € 0, (3) В, = .05, 
all at significance level .01. 


Component Sums of Squares in Regression. The Y score of 
any given individual (Y.) is the sum of three components: (1) the 
mean of the entire Y distribution (Y); (2) a regression estimate 
(7. — Y) of his deviation from that mean made on the basis of his 
deviation from the mean of X; and (3) the deviation of his score 
from the regression estimate, or the residual error (У, — Tm 
symbols 


Component Sums of Squares in Regression - 243 


Y, = (Y) + (fa - Y) + (Y. - Y.) 
OD У 
Therefore 
(Y, - Y? = (Y, Y) + 2(¥. - (У. - £.) + (Y. - Pa)? 
When this expression is summed for all individuals in the sample, 


the eross product term is found to be identically equal to zero. 
Then we have the very important relation 


(10.35) у (У.-У)= ) (ў, – Y) + ee -rYy 


which may be put into words as follows: 


The sum of the squares of the deviations of the observations from 
the mean of all observations is equal to the sum of the squares 
of the deviations of the regression estimates from the mean of all 
observations and the sum of the squares of the deviations of the 
observations from their regression estimates. 


The left-hand member of Formula (10.35) is already well known 
as equal to (№ — 1)s,?.. The two expressions on the right can be 
reduced to expressions in s? and r: 


(10.36) y (Ӯ. - Р) = (N - Dein 


(1037) and Ў (У, Р.) = (N - avr н) 


The algebra which leads to these results will not be reproduced 
here but it consists in substituting for Y its value from Formulas 
(10.1) and (10.2), squaring the binomial, summing and combining 
similar terms. Then 


(10.38) (N — Ds? = (N – Drs + (N - 0 - 7s 
(10.39) and sj = rs - 7s 


This is an important relationship leading to an interpretation of 
т. The total variance among the Y scores 8,’ has been divided 
into two components. The component r?s,? comes from variation 
of Y, about Y and so is the portion due to regression. The com- 
ponent (1 — r?)s? comes from the variation of Y, about Fa that 
is the variation of scores about the regression line, and is variation 
due to some source other than regression on X. Therefore 


244 - Linear Regression and Correlation 


т? is the proportion of the Y variance attributable to the rela- 
tion of Y to X 
1 — т? їз the proportion of the Y variance not attributable to the 
relation of Y to X. 

The relationship in (10.38) may be regarded as a partition of 
the sum of squares of the same general nature as the partition 
described in Chapter 9 as analysis of variance. This partition is 
displayed in Table 10.3. 


TABLE 10.3 Analysis of Variance in Regression 


Degrees of Sum of Mean 


Source of variation 
freedom squares square 


Regression estimates 

around Y 1 (N — 1)r°s,? (N — 1)r's? 
Observations around 

regression estimates N-2 N = 1)(1 — rs? =; (1 -rs 


Observations around Y N-1 (N — 1)s,? ejt 


From the two components of the sum of squares an F ratio 
can be obtained to test the hypothesis that the regression coeffi- 
cient is zero in the population, so X is of no value at all in es- 
timating Y. 


Ig Buz =0 
р (М-ти үк —1)(1 – т), 
1 N-2 
r(N — 2 
(10.40) or F= 0-29 


Since F has 1 degree of freedom for the numerator variance and 
N - 2 for the denominator, 


(10.28) t=VF=r у? with N ~ 2 degrees of freedom. 
Thus the formula already given as (10.28) on page 241 has now 
been reached again by analysis of variance. 

_ This partition of the sum of squares shows the effect of the 
size of the correlation coefficient upon the precision of estimate. 
If r is large, most of the variance in the y scores can be attributed 
to the information contributed by the regression equation. If r 
is small, little information is contributed by this equation. The 
variation must then be explained by other means if at all. The 


Test for Linearity of Regression · 245 


F test may be used to determine whether the information yielded 
by the regression study makes any contribution at all. 

Sometimes F is to be computed directly from original data 
and the value of is not needed for any other purpose. In that 
case, ease of computation and desire to minimize the effect of 
rounding errors make it expedient to use Formula (10.41) which 
is equivalent to (10.40) and closely related to (10.28). 
 T(N-2) _ Саки – 2) 
ишег Dry? — (ту)? 

Test for Linearity of Regression. If regression is truly linear 
in the population, the mean и; of all values of Y which are as- 
sociated with a particular X; will lie precisely on the population 
regression line: миг = ми + Bye(X — X). In this саве, шу — uy. = 0 
for every 7. However, even when regression is linear in the popu- 
lation, the sample values of Y; will not agree precisely with the 
population values of y; and will not all lie on the sample regression 
line Y=Y+b,(X -Х). It is apparent that a test of the 
hypothesis иу — Hy.z = 0 should involve the sum of squares of the 
sample differences Y; — Y;. 

'To simplify the formulas needed we shall use these additional 
symbols: 


(10.41) F 


N 
T" = gum of all y’ scores in table = Y ya’ 
а=1 


Ni 
Т = D, у.’ = sum of all y’ scores associated with X; 


а=1 


k Ni 
C = sum of squares of Y scores about regression = 9 X Ya- 
1 


=1 a=1 


W = sum of squares of Y scores about column means 
k Ni 
(10.42) We) У МУ, - У, 
ј=1 а=1 


М = sum of squares of Y means about regression line 
k 
(10.43) M-) NO; Y» 
j=l 
As W and M are independently distributed, they can be used 


with their degrees of freedom in an F test, and W + M = C, 


C has N — 2 degrees of freedom 
W has N — k degrees of freedom 
and М has Е — 2 degrees of freedom 


246 - Linear Regression and Correlation 


The statistic 

M/(k-2) M N-k 

W/(N—k) W k-2 

provides a test for the hypothesis of linearity. Formulas to 
facilitate the computation of M and W from an arbitrary origin 
are needed. 


(10.44) е 
Day)? 
(10.45) C= zy Gay 


Where x and y are deviations from means, not from arbitrary 
origins. 


TUE s (Ту 
(10.46) W = s 2 - ANTI 
(10.47) M=C-W 


These formulas will now be applied to the data in Table 10.2. 
N2y? = 9(13172) 
N22? = 9(23292) 
Nay = 9(10452) 


EER _ 81(10452)?] _ 
С = 25 [912172 “9(93292) ] = 1817.53 
(Ту)? Bi de 56? 28? 
= 2 Р. <- +y +33 = 2886.62 
W = 9(2514 — 2386.62) = 1146.42 
М = 1817.53 — 1146.42 = 671.11 
me 671.11|1146.42 671.11 29 — 1.54 


13—2|42—13 1146.42 11 


Bivariate Population Model. The regression model described 
on page 239 is appropriate to use when interest centers around 
the problem of estimating one unknown variable from another 
known variable. In problems about the mutual relationship of 
two variables or in problems in which it is necessary to find an 
estimate of either variable from the other, that model will not 
suffice. For example, suppose two tests, X and Y, have been given 
to a group of subjects on different days, a few subjects being 
absent on one day and a few on the other. It is for some reason 
not feasible to obtain the missing scores by administering the test 
again. From the subjects for whom both scores are available, 


Bivariate Population Model - 247 


two regression equations are computed, one to estimate the missing 
scores in X from observed scores in Y and the other to estimate the 
missing scores in Y from observed scores in X. 

In the mathematical model used for problems of this sort each 
individual is a pair of observations. A sample is then a sample of 
pairs from the population of paired observations. In this model 
both X’s and Y’s vary randomly from sample to sample, whereas 
in the regression model previously described the X’s are fixed for 
all samples and only the Y’s vary. A population of this type is 
called bivariate. 

The probability distribution of a population on a single con- 
tinuous variate, that is a univariate population, was discussed in 
Chapter 5. That distribution consists of all the points on a line 
segment (which may or may not extend to infinity) and proba- 
bilities corresponding to specified intervals on that line. Those 
probabilities are represented by the areas under the probability 
curve between two ordinates. 

For a continuous bivariate population, the scales of the two 
traits will be represented by two coordinate axes, and each pair 
of scores for one individual will locate a point in the plane of those 
axes. The bivariate probability distribution then consists of all 
points in a portion of this plane (which portion may or may not 
extend to infinity) and the appropriate probability values. These 
probability values are represented by volumes. The probability 
that a< X « b and simultaneously с< Y < dis written 


Р{а< X«b,c« Y « dj 


This probability is a volume within a solid figure whose base and 
4 sides are planes but whose top 
may not be a plane. The X, Y 
scales, the intervals and proba- 
bility are shown in Figure 10-1. 
The probabilities of the type 
shown in the figure may all be 
described as volumes under a sur- 
face which has a mathematical 
form. 


ANT 5 Fic. 10-1. Segment of volume repre- 
In any bivariate population, senting the probability Pla < X < b, 
each variable has its own mo- c«Y <d} in a bivariate distribu- 
ments and so the probability dis- #90: À 
tribution involves the four parameters Mz, My, 02, and су. In addi- 


248 - Linear Regression and Correlation 


tion the bivariate population has a product moment which is the 
expectation E(X — и.) (У — u,). This product moment is also 
called the covariance. The correlation coefficient of X and Y in 
the population is defined as 


(10.48) Poy = PLX = в СР = 9] 


тту 


Normal Bivariate Population. In this chapter we shall be 
concerned with only one type of bivariate population, namely 
the normal bivariate, which is an extension of the univariate normal 
population to the bivariate situation. The tests of significance 
described in the remainder of this chapter are based on the assump- 
tion of such a population and do not necessarily apply if the popula- 
tion differs too greatly from that form. 

For the normal univariate distribution of X, every point on a 
line represents a possible value of X. To set up the probability 
curve for X, an axis is drawn perpendicular to the scale of X. On 
this axis the ordinate (which we shall call и) is measured to repre- 
sent the probability density of X. Then as already discussed in 
Chapter 5, 


1 — 1a 

(10.49) u PIA € 3 ся 

This curve has two parameters, и, and с;. The ordinate u is not 
a probability because probability would be represented by the 
area under the normal curve between two ordinates, but it may 
be thought of as something like the average height of such an area 
when its base is very small so that the two ordinates are nearly 
the same. 

For the normal bivariate distribution of X and Y, every point 
on a plane represents a possible pair of values. То set up the joint 
probability surface for X and Y, an axis is drawn perpendicular 
to that plane. On this axis the ordinate (which we shall call и) 
is measured to represent the probability density of the X, Y dis- 
tribution. For each point this ordinate is 


1 Ll. | Z) евр) ү (Fw)? 
(10.50) шея ed), at} 
л Et aaa d 
This probability surface has 5 parameters, Hz, My, сг, Ty, and р. 
The ordinate и is not a probability because probability is repre- 
sented by a volume, but it may be thought of as something like 


Distribution of the Correlation Coefficient - 249 


the average height of such a volume when the area of its base is 
very small, so that all the ordinates are nearly the same. 

When the parameters are given definite values, w is defined for 
each point (X, Y), and varies from point to point. Populations 
with different parameters will have different probability distri- 
butions. 

The normal bivariate probability distribution is traditionally 
spoken of as having the form of a “cocked hat.” The marginal 
distributions for X and for Y are both normal. Furthermore, if 
the three dimensional solid is cut by any plane perpendicular to 
the (X, Y) plane, the intersection is a normal curve. 

If any bivariate probability distribution, whether normal or 
not, is cut by a plane parallel to the X, Y plane, the intersection 
will be a kind of contour line marking out those points in the plane 
which have equal probability densities, that is points at which the 
ordinates are equal. For the normal bivariate distribution all 
such contour lines are ellipses with their center at the point 


(X = Hs, Y = by) 


All such ellipses for one distribution have the same axes so they do 
not intersect each other. 

Each normal surface has its own set of ellipses. These ellipses 
are of particular interest because when a sample from a normal 
bivariate population is plotted on a scatter diagram the distribu- 
tion of tallies is concentrated about the point representing the 
sample means in accordance with the form of the ellipses obtained 
from the distribution. This phenomenon is familiar to all persons 
who have worked with scatter diagrams and corresponds to the 
concentration of tallies about the mean of a univariate normal 
population. Figure 10-2 shows how these ellipses change as the 
parameters of the distribution are changed. 

Distribution of the Correlation Coefficient. One difficulty in 
problems of statistical inference in relation to the correlation coeffi- 
cient arises because the range of this coefficient is — 1 to + 1 both 
for г and р. Hence г can never have a truly normal distribution 
regardless of the form of the population sampled. When the 
population is bivariate normal and p = 0 the distribution of r is 
symmetrical and when the number of cases is large the distribu- 
tion of r can be reasonably approximated by the normal curve. 

When p differs from zero the sample r’s cluster about the 
population value as may be expected. But now there is less space 


250 - Linear Regression and Correlation 


22 ЕБ 
9,=12 
2,730 и,=20 ву=1 
6 27 28 29 30 31 32 33 34 
p=|—.6 

0,=1 

в,=2 


6 27 28 29 30 31 32 33 34 


р=18 
“| 0.2 
gs1 
иу=15 
13 РР 13 
t, 530 и,=15 
14 15 16 17 18 26 27 28 29 30 31 32 33 34 


Fro. 10-2. Ellipses of equal probability density for normal bivariate surfaces 
having different values of the five parameters, uz, Hy, Oz) Ty, Dzy- 


on one side of p than on the other. Hence the clustering is denser 
on one side than on the other, and the distribution of r is skewed. 

A mathematical formula which deseribes the distribution of r 
for any value of p was derived by R. A. Fisher in a classic paper 
which ushered in a new way of thinking about statistical distri- 
butions (see reference 6), but it is very complicated, involving p 
and N as well as т. This exact formula for the sampling distribu- 


Hypothesis that р = 0 - 251 


tion of r is seldom used but as substitute there are several approx- 
imations which are valid under certain assumptions and give 
results accurate enough for all practical purposes. The exact 
distribution of r has been published by David with a discussion of 
the accuracy of approximation of the various substitute methods 
for testing the significance of т.? 

When N = 2, r can take only the values + 1 and — 1 and so the 
distribution consists merely of an ordinate at each of these points. 
When N - 3 and p = 0 the distribution of r is U-shaped, that is, 
symmetrical but lower in the center than at the ends. As N in- 
creases the distribution becomes higher in the center and lower 
at the ends until for М = 30 or more, its form is approximately 
normal. 

Tests of the Hypothesis that Correlation is Zero in the Popula- 
tion. When a normal bivariate population has the parameter 
р = 0, the statistic 


r(N — 2) 


ты 
has the F distribution with 1 degree of freedom for the numerator 
and N — 2 for the denominator. Consequently VF = t, and the 
statistic 


ТУМ -2 


м-в 


has “Student’s” distribution with М — 2 degrees of freedom. This 
is an exact distribution and applies to samples of 3 or more. 
(Obviously correlation is meaningless in a sample of 2 cases because 
т could have only the values + 1 and — 1 unless both values of 
one variate were alike and then r would be indeterminate. The 
reader should notice that while r ranges only from — 1 to + 1,1 
ranges from — © to + 6o. 

Formula (10.28) has already appeared as providing a test for 
the hypothesis B,, = 0. But when р = 0, By. = 0 and when Ву. = 0, 
р = 0, so the hypotheses р = 0 and В = 0 are really the same. 

Table XI shows for varying values of n = N — 2 the value of r 
such that the critical region for rejection of the hypothesis p = 0 
consists of all values of r numerically greater than the tabulated 
value of r. 

A second test of the hypothesis p = 0 is available when the 
sample is not small, say N — 30 or more. This test is obtained 


(10.28) t 


252. Linear Regression and Correlation 


from the fact that if p = 0 and N is not small the distribution of r 
is approximately normal with mean zero and standard error 
1/V/N —1. Therefore, under these circumstances 


(10.51) = r/N—1 


has approximately the unit normal distribution. 

Warning. A formula which is widely but incorrectly used should 
be mentioned in order that the student may understand why it is 
no longer in good repute. This formula is sometimes written 


Lie A 1-7 
т, = —.— and sometimes з, = ———=- The standard error of r 
VN VN -1 
is correctly given by the formula 
22785 
(10.52) pi EUER 
N-1 


But this involves the unknown p. If p — 0, this formula be- 
comes 


ity Deos 

N-1 
as previously stated. "There is no justification for substituting 
r for p in (10.52) to produce the incorrect formulas quoted at the 
beginning of this paragraph and to do so sometimes produces 
results that are extremely misleading. 

Another incorrect practice which is so common that it cannot 
be passed over in silence is that of computing a probable error of 
r as .6745(1 — r?)/VN — 1 and attaching it to r, as for example, 
т = .65 + .03. A “probable error" has meaning only in a normal 
distribution and so is obviously out of place in this connection. 


EXERCISE 10.2 


Compute both "УЛ —2/V1-— r and rVN—1 for r=.06, .20 
and .60 and N = 5, 26, 82, 102, and 401. Examine the results to see 
whether the two formulas appear to be in closer agreement when N is 
large or N is small. Is there anything in the foregoing explanation which 
would account for this apparent tendency? Do the two formulas appear 
to agree more closely for large or for small r? Can you account for this 
tendency? 


(10.53) в, = 


Tests that р is Some Value Other than Zero. When p is 
different from zero, the distribution of r is skewed and neither of 
the tests described in the previous section can be used. To obtain 


Tests that p is Some Value Other than Zero - 253 


the data of Figures 10-3 and 10-4 samples were drawn from a 
bivariate population for which p was .58. The distribution of 
r in samples of 5 cases as shown in Figure 10-3 is very skewed. 
For samples of 20 cases, as shown in Figure 10-4, the distribution 


35 


30 


25 


.20 


15 


—7-6-5-4-3-2-1 0 1.2.3.4 5 6 7 8 9 
Scale of r 


Fre. 10-3, Distribution of г in samples of 5 cases from a population 
in which p = .58. 


is markedly less skewed though still far from normal. As N in- 

creases the distribution of r becomes more nearly symmetrical 

so that for extremely large samples with p not too near +1 or 
per 


— 1 treating Р as normally distributed would give fairly satis- 


T 


factory results. However a much better procedure is available 
which does not depend upon the dubious assumption of normality 
for г. 


Scale of r 


Та. 10-4. Distribution of т in samples of 20 cases 
from a population in which p = .58. 


254 - Linear Regression and Correlation 


A variable, which is usually called z but which we shall call z, 
in order to distinguish it from the other variables which have 
already been denoted by that letter, is related to r by the formula 


1 TET 
(10.54) 2-5 log. Te 
(10.55) = 1.1503 logi, = 
(10.56) = 1.1503 [logi (1 +r) – 105% (1 — r)] 


This variable, introduced by R. A. Fisher, has two very great 
advantages. Its distribution is approximately normal even for 
small samples in which p is near 1, and its standard error does not 
depend on the unknown population value but only on the size 
of the sample, 
P 1 
VN -3 

If ¢ is the population value corresponding to p, then (а, — ММ - З 
may be treated as a normally distributed variable. 

The transformation of r to 2, and vice versa can be made easily 
by reference to Table XII. 

Suppose we have т = .76 for М = 228, and we wish to test the 


hypothesis that p = .80 with а = .05. А correct procedure would 
be as follows: 


(1) From Table XII we find that if r = .76, z, = 1.00 

(2) From Table XII we find that if p = .80, ¢ = 1.10 

(3) VN - 3 = 15 

(4) Then (1.00 — 1.10)15 = — 1.50 

(5) As this is a one-sided test, we read from the normal proba- 
bility curve 2, = 2 = — 1.64. The critical region is 
2< - 1.64. 

(6) The hypothesis that р = .80 is not rejected. 

We may contrast with this the decision which might have been 

incorrectly obtained by computing о, = .024 and 


(т = p)/o, = = 1.67. 


The computations are not incorrect but as we have no proba- 
bility distribution to which the statistie — 1.67 can be referred, 
they lead nowhere. If now we incorrectly refer it to the normal 
probability distribution and reject the hypothesis p = .80 because 
— 1.67 < 2.0, we shall make an incorrect decision. 


(10.57) os, 


Test that Two Correlations are Equal · 255 


Confidence Interval for p. The z-transformation is useful in 
obtaining a confidence interval for p. The procedure is as follows: 
(1) In Table XII find the value of z, corresponding to the 
sample value of r 
(2) Compute a confidence interval for the population value ¢ 
by the formula 


Zia Z1—3a "Um 
(10.58) (> +ум-5< сЕ в l-a 
Here one must remember that z, and z, are two quite different 
values represented by the same letter only because of tradition 
and the brevity of our alphabet. 

(3) In Table XII read the value of p corresponding to each of 

the bounding values of ©. From these form the interval 
for p. 

Thus suppose we have obtained r = .65 in a sample of 40 cases 
and we want an interval estimate for p with confidence coefficient 
.95. From Table XII we read 2, = .78. From the normal proba- 
bility table we find 


Zoe = — 1.96 and z= 1.96. VN —3- V37. 


1.96 1.96 
Th .78 – == ri — 46< {< 1.10 
en 78 Var <{<.78+ 5787 ог $ 


and 43 < p< .80 


Notice that because of the skewness in the distribution of r the 
confidence limits for p are not symmetrically placed about the 
observed r, but .80 — .65 = .15 while .65 — .43 = .22 

David's Tables of the Correlation Coefficient? show the complete 
cumulative distribution for r in samples from populations for which 
0100, 1,2 39,14) 4050/55 0259 and3 < N < 25, N = 50, 
100, 200 or 400. She also presents confidence belts for r for samples 
of the sizes listed, with а = .10, a= .05, a= .02 and а = .01. 
The chart for а = .05 has been reproduced as Chart XIV in the 
Appendix. 

Test of the Hypothesis That Two Independent Populations 
Have the Same Correlation. If a sample is drawn from each of 
two independent, normal bivariate populations the difference 
ту — т» should fluctuate around pı — р. If each т is transformed 
to z, the difference 21 — 22 will also fluctuate around {л — f» as mean 
and will have as standard deviation 


256 - Linear Regression and Correlation 


1 1 
3 8 
Then 
(10.59) 2-2 — (61 — &) 


1 1 
V5.3 MS 
may be treated as a normal deviate. If the hypothesis that 
{1 = f» is rejected, the hypothesis p; = p» must also be rejected. 
If the hypothesis {у = { is sustained, the hypothesis p; = p» must 
also be sustained. 

In his research concerning Children’s Collecting Activity * Durost 
studied the collections made by 50 boys and 50 girls between the 
ages of 10 and 14. Among the correlations obtained was a correla- 
tion between mental age and the average rating of the child's 
collections (each of the collections of each child being rated as to 
quality and the average taken for the child). For boys this 
correlation was .31 and for girls .06. Do these figures provide 
evidence that the relation between mental maturity and quality 
of collections made is higher for boys than for girls? 


T Zr N=3 > 


N-3 
Boys 31 .3205 47 .021277 
Girls .06 .0601 47 .021277 
2в — 26 = .2604 042554 = 0°, + og 
.2604 _ .2604 
В ва = 1.2 
v.0426 .206 5 


With so small a value of z, it would be inappropriate to assume 
that the relationship is higher for boys than for girls. 

Test of the Hypothesis That р, = р, When Computed for 
the Same Population. This situation is quite different from that 
of the preceding section though often confused with it. "There we 
had measures on the same two variates X and Y for two different 
populations. Here we have for one population measures on three 
variates, X, Y, and Z, and we wish to know whether Z is more 
highly correlated with X than with Y, or vice versa. This ques- 
tion is particularly important when two predictors are available 
but the one showing the higher correlation with the criterion is 
more expensive, more time-consuming, or more difficult to apply. 
For example, suppose that a prognostic test, long and rather dif- 


Exercise - 257 


ficult to administer, has yielded a correlation of .56 with Freshman 
college marks, while a shorter test, less costly to administer, has 
yielded a correlation of .48, the two correlations being obtained 
from the same 100 subjects. If the tests were equally costly the 
one yielding the higher correlation would be chosen regardless of 
whether the difference in the r's was significant or not. Still 
better, both tests would be used, and prediction made from a 
multiple regression equation. However let us suppose the college 
authorities have decided to use only one test and to use the more 
costly test only if its correlation with marks is significantly higher 
than that of the other test. If the hypothesis py: = рг: proves 
tenable, they have decided to use only the shorter test. 


Let X =score on the longer test 
У = seore on the shorter test 
Z = average mark at end of first semester 


Hotelling " has given a solution of the problem of the significance 
of the difference between ry, and Tz: without making any assump- 
tion as to the form of distribution of X or of Y in the population, 
but with the limitation that generalization is only to a subpopula- 
tion of all possible samples for which X and Y have exaetly the 
same set of values as those in the observed sample. It is assumed 
that Z has a normal distribution for each value of X and for each 
value of Y, with common variance. 


(N — 8)(1 + ты) 
(10.60) Then ¢= (T'as = т) Vai uae Thy EUER Ts Es Dress) 


has *Student/s" distribution with N — 3 degrees of freedom. 
Obviously it is necessary to know the correlation between the 

two tests before the question can be answered. Assume that 

Toy = .52. 

Then t = (.56 — .48)V/153.10 = 1.61 


In view of this small value of t it would be difficult to make a 
strong case for the use of the longer test if only one test is to be 
used. It is of course easy to show that the predictive value of 


r = .43 is too low to be of much practical use. 


EXERCISE 10.3 

1. (a) In the problem of the preceding paragraph, test the significance 
Of fee — rys if N = 100 and rez = .56, т,» = .43 as before but rz, = .20. 

(b) If rey = .70. 


258 - Linear Regression and Correlation 


Regression Equations in the Normal Bivariate Population. In 
the normal bivariate population each of the variates may be re- 
garded as independent and as related to the other by a linear 
regression relationship. The two equations are defined in the 
population as 


(10.61) 4 = Hy + By (X — иг) and ги = Hz + Вы (У — uy). 
The corresponding standard errors of estimate are 
(10.62) Фу» = 0,V1— p! and ory=oxV1—p? 


The corresponding regression equations for the sample have this 
form 


(10.68)  f-Y-cb.X-X) and X=X+5,,(Y -Y) 
and the corresponding standard errors are 


(10.64) sy. = sy i—i тата а= „у — з (1-7? 


EXERCISE 10.4 
1. A prognostic test in mathematies was given to 98 students who were 
about to begin a course in elementary statistics. The results of this test 
were examined in relation to scores on the final examination. The pairs 
of scores are not given below but data derived from the scores are shown. 
Tn the following, X is a score on the prognostic test and Y the score on the 
final examination. The sums of scores, sums of squares and cross-products 
obtained by machine computation follow: 
УХ = 8075 ХҮ = 4817 ХХ? = 101581 УХУ = 154468 SY? = 245381 
a. Compute X, Y, бу, s, sy, Tey, Sy.2 
b. Find the regression equation to predict Y from X 
с. What proportion of the variance of Y is independent of variation 
in X? 
d. Test the hypothesis H : p = 0 
e. What Y value would you predict for a student with X = 62? 
Х = 88? Х = 54? 
f. What interval estimate would you make with confidence coefficient 
95 
(1) for ду. when X = 42 (2) for By (3) forp (4) for o%y.2 
2. Suppose a study has yielded data as follows: 
N = 122, Za! = 94, I(x’)? = 990 1. = 3, А. = 45 
Za'y’ = 420, ху = — 35, X(y')? = 598, и 5; A, = 60 
а. What is the equation to estimate У from X? 
b. What is the equation to estimate X from У? 
c. Find a confidence interval for p with confidence coefficient .99 


Exercise - 259 


d. Test the hypothesis that p — .60 

e. Test the hypothesis that 84, = 1.25 

3. In each of the following situations there is some inconsistency, some 
reason why real data would not yield these figures unless a mistake had 
been made. Explain what is wrong in each case. 
. Y = 96X + 13.3 and X = 142Y — 12.5 
. Улу = 1543, Xa? = 1136, and Xy? = 1560 
‚ r = .65, bye = — 24, Dry = — 72 
т = .60, b,, = .80, bzy = .90 
r= .60, sy = 12, Sys = 7.2 
т = — .20, b,, = .80, bzy = .50 
f=-9X +5, Х=-.6У+4,г= 43 
„ т = .5, Sz = 4, 8, = 10, bye = .25 
т = 5, 8. = 6, & = 3, Ñ = т, $ —.25y 
. In a sample of 203 cases from population A a correlation coefficient 
of .66 has been found. In a sample of 153 cases from population B, a 
coefficient of .70 has been found. Test H: pa = рв. 

Additional formulas often useful in computations for correlation and 


regression: 
Da’)(Zy’)\.. . 
(z zy — Caen) X Du, 


$Prmemogococm» 


Iry = 
- .GXYY) 
= УХУ N 


Уху = Ул? + Ey! — U(x — yy 
= D(z +y) – Ха - Xy 


- is (zy + (у) 500 у) aCe EY ) 
= ухе zYr- 3(X — Y} - 2050020) 


Dey = idi (N ау — (Х2/)(20)) 
= NzXY - (zX)(ZY) 
2NZzy = NZX? + NEY? - М(Х — Y} - 2(X)(ZY) 


3 zy? 
sp - (zu S) 


2p. (ЁЮ? 
= Бу? N 
МЗ -d4 zy — Gv») 
= NIY:- (Хү)? 
Хау = 210 
by: = Da and by = Sy 
Iry 


9 VIr Ey? 
(Y=) VN р N(N — 2) Ха? 
a Syz = (7.- m) Er Dy? — (Say)? 


260 - Linear Regression and Correlation 


_ (F-m) 
B =. 


22 — (Уту)? 
(®а?у (N — 2) 

Bye _ (Zay) (N – 2) 

s, Хау: — (Day)? 


Soy = 


REFERENCES 


1 


2. 
3. 


1, 
12. 


. Croxton, Е. Е. and Cowden, D. J., Applied General Statistics, New York, 


Prentice Hall, 1940. 

David, F. N., Tables of the Correlation Coefficient, Cambridge, England, Cam- 
bridge University Press, 1938. 

Durost, Walter N., Children’s Collecting Activity Related to Social Factors, 
New York, Teachers College, Columbia University, Bureau of Publications, 
1932. 


. Dwyer, Paul, Linear Computations, New York, John Wiley and Sons, 1951. 
. Ezekiel, M., Methods of Correlation Analysis, New York, John Wiley and 


Sons, 1930, 1941. 


. Fisher, В. A., “Frequency Distribution of the Values of the Correlation 


Coefficient in Samples from an Indefinitely Large Population,” Biometrika, 
10 (1915), 507. 


. Hotelling, Harold, “The Selection of Variates for Use in Prediction, with 


some Comments on the General Problem of Nuisance Parameters,” Annals 
of Mathematical Statistics, 11 (1940), 271-283. 


. Mills, Е. C., Statistical Methods, New York, Henry Holt and Company, 1938. 
. Pease, Katherine, Machine Computation of Elementary Statistics, New York, 


Chartwell House Inc., 1949. 


. Walker, H. M., Mathematics Essential for Elementary Statistics, New York, 


Henry Holt and Company, 1952. 
Walker, H. M., Studies in the History of Statistical Method, Baltimore, The 
Williams and Wilkins Company, 1929. 


ji a Albert E., Elements of Statistical Method, New York, McGraw-Hill, 


1 Other Measures of Relationship 


This chapter will take up methods of measuring relation- 
ship in situations where the assumptions specified in Chapter 10 
are not wholly met and in considering the appropriate tests of 
significance. It will also deal with certain problems related to 
the use of the product moment correlation in special situations. 

Biserial Correlation. When one of two traits is scored on a 
continuous scale and the other on a scale having only two values, 
the resulting correlation is called a biserial correlation. The scale 
with only two values is called dichotomous. This situation arises 
very often in statistical studies. A vocational counselor is con- 
cerned to know the relationship of some measured trait or traits 
to such a dichotomous trait as employment status (employed or 
unemployed), success on the job (fired or not fired), sex (male or 
female), liking for a type of work or activity (liking, disliking), 
technical training for a job (trained, untrained), and the like. A 
test writer needs to analyze the items in a test to retain those 
which have a high correlation with respect to some criterion and 
to reject those which have a low correlation. If the criterion is a 
scaled trait and the item is dichotomous, a biserial correlation is 
called for. The item scores are dichotomous if only two answers 
are possible (such as “true” or “false”), or if several possible 
answers are classified into those which are acceptable and those 
which are not acceptable. In an attempt to discover measures 
which could be used to predict success in a professional school, 
one might use "survival" as a criterion, classifying students into 
groups, those who drop out and those who complete the course. 
A biserial coefficient of correlation is used in a very wide variety of 
situations of which the foregoing are illustrative. 

Two different coefficients of biserial correlation will be dis- 
cussed, the point biserial coefficient for which we shall use the 
symbol r», and the biserial coefficient for which we shall use the 
symbol ть. These symbols are not standardized, ть is in fairly 
wide use but there is no generally accepted symbol for the point 
biserial. 


262 - Other Measures of Relationship 


The Point Biserial Coefficient of Correlation. This coefficient 
provides a very simple solution to the problem of biserial correla- 
tion. Arbitrary numbers are assigned to the two classes of the 
dichtomous variable. The outcome will be the same no matter 
what numbers are chosen but the use of 0 and 1 holds computa- 
tional labor to a minimum and so these are commonly chosen. 
Each individual then has a pair of scores, one which is based on 
a continuous scale and one which is either 0 or 1. The correlation 
coefficient may then be computed by the formulas of Chapter 10. 
This is a product moment correlation. It can be used in combina- 
tion with other product moment coefficients in a multiple regression 
equation. It has thesame range of values as other product moment 
r’s, from — 1 to +1. Its significance is tested in similar fashion. 

However the general formula can be reduced to a very simple 
formula for computing the point biserial coefficient. Suppose N 
individuals have been measured, and of these №, fall in the cate- 
gory which is arbitrarily scored X = 1 while №, fall into the class 
which is arbitrarily scored X — 0. Let the means of the scaled 
trait for these two classes be Y; and Y, For the entire group 


N = № + № and mean is Y = X (M,F, + №7) and 


Se pee УИ 
Taper NENT 


Then the point biserial correlation is 
Vea Ys №№ 
s NN- 
and will be positive if Y, is larger than У. If N,/N =p and 
N/N = q, the formula may be conveniently written as 


(11.1) Tr = 


Yi- Ys и 
(11.2) ть = VER Npq 


If a number of biserial coefficients are to be computed from the 
same data, the work is reduced by using a formula involving Y 
(which can be computed from the entire group and remains con- 
stant from one correlation to another) and either Y; or У, but not 
both. If № is smaller the formula involving Y, is used; if №, is 
smaller, the formula involving Yj. 


Y:-Y JNN үчү, 
(53) ip жы АП У-Ү, 
pb Vy? No Sy? Ni 


The Point Biserial Coefficient of Correlation - 263 


or 


(114) € MN ~Y-Yo,/ NN 
у 


NNNSN a У ПО 


Only very simple algebra is needed to demonstrate the equivalence 
of these formulas and their derivation from the more general 
formula for the product moment coefficient. 


If N is very large so that N N E may be treated as 1, the formula 


reduces to any of the following forms: 


(11.5) т» = DA pa 
у 
Ko ¥ /p ушу 
TI - 
ШШ) Sy q Sy No 
"as т-у 7-м 
d Sy p Sy Ni 


The use of point biserial correlation may be illustrated by data 
gathered by Spaney." A battery of tests was administered to 
308 student nurses shortly after their admission to nursing school. 
Among the measures obtained was a rating of the student nurses 
on the trait steadiness-emotionality, and these ratings may be 
treated as a continuous scale. Six months later, at the end of 
the preclinieal period, 29 students had withdrawn. 'The mean 
steadiness-emotionality rating for the 29 withdrawals was 


Y, = 35.07 


and for the 279 who stayed on it was Y; = 30.00. For the entire 
group of 308 the standard deviation was s, = 11.62. Hence the 
point biserial coefficient was 


TONES / (29) (279) 30.00 — 35.07 _ 13 

$ (308) (307) 11.62 
Since the scale is so arranged that a high score indicates steadiness 
and a low score emotionality, the correlation suggests & slight 
tendency for those staying on to be, in general, less steady, more 
emotional than those withdrawing. The correlation is signifi- 
cantly different from zero according to the test discussed in a 
subsequent section, but the correlation is so low that rating could 
not be used to furnish any useful prediction as to whether an 
individual student is or is not likely to withdraw. 


264 - Other Measures of Relationship 


Table 11.1 presents the kind of data commonly assembled by 
a test technician making an item analysis, except that he would 
have more subjects and more items. Since Y = 43.4 for every 


TABLE 11,1 Point Biserial and Biserial Correlation Coefficients Obtained 
for 20 Subjects on each of 5 Items 


Y 
Subject Criterion Item 1 Item 2 Item 3 Item 4 Item 5 
Seore 
1 50 1 1 1 0 0 
2 46 1 1 0 0 1 
3 37 0 0 0 0 1 
4 48 1 1 1 0 0 
5 47 1 1 1 0 0 30 
6 40 0 0 0 1 1 У Y = 868 
7 39 0 0 0 0 1 1 
8 43 1 1 0 1 1 Y = 43.40 
9 42 0 1 0 1 1 ЖҮ? = 37802 
10 46 1 1 1 0 0 
11 42 0 1 0 0 1 (808/20 = 876712 
12 41 0 1 0 0 1 20 
13 43 0 1 0 1 1 з= 1908 
14 43 0 1 0 1 1 1 
15 44 1 1 1 0 0 8, = 3.17 
16 42 0 1 0 0 1 
17 41 0 0 0 1 1 
18 45 1 0 0 0 0 
19 46 1 1 0 0 0 
20 43 1 0 0 1 1 
Ia Si Е WS D А OPEN ЖИЙ 
Ni 10 14 5 1 13 
No 10 6 15 13 7 
ЖҮ, 458 623 235 295 542 
2 410 245 633 573 326 
Yi 458 445 470 421 417 
У 41.0 4083 422 441 466 
ть YE ВОВА 267 арта 
Tois ТЕ) — 95 


item, the use of Formula (11.3) would make it unnecessary to 
compute both Y; and Yo, so the experienced test technician might 
compute only the one based on the smaller number of cases. For 
item 3, the computation of point biserial would in this case be 


„= 47.0 — 43.4, /бу@бу _ 
3 317 У (15)09) 9 


Test of Significance for Point Biserial Correlation - 265 


Test of Significance for Point Biserial Correlation. It will be 
assumed that in the population the distribution of Y scores can be 
described by two normal curves, one curve for the Y scores paired 
with X = 0 and one for the Y scores paired with X = 1. These 
two curves have the same standard deviation оуу1 — p?», where 
p»» is the point biserial coefficient for the population. The mean for 


the Y scores paired with X = 0 is uo = Hy + Poo (0 — X) and the 


mean of У scores paired with X = 1 is ш = My + ррь cu =X); 


These are the same assumptions of normality of distribution 
and homogeneity of variance which were made in Chapter 7 for 
testing the hypothesis that the means of two populations are equal. 

To test the hypothesis that pp» = 0, the statistic 


TN – 2 
(41.8) 1 и: 
It has “Student’s” distribution with N — 2 degrees of freedom. 
If рь = 0, then ш = po = Hy, and the test that pj, = 0 is also the 
test that the two normal distributions have the same mean. If 
any of the formulas for rj, is substituted in (11.8) the latter can 
be reduced identically to the familiar formula first presented in 
Chapter 7 for comparing the means of two independent popula- 
tions with equal variance: 


may be used as in Chapter 10. 


(11.9) Macc Y-Y. 
i Sy. Y, == (2Y1)2/M1 = (ZY2)?/N2 Ni + No 
Ni +№-2 №№ 


For the correlation obtained for item 3 of Table 11.1, Formula 
(11.8) would give 
...078V18 _ 2858 456 
vV1-—(073)9 %.5471 
while Formula (11.9) would give 
47.0 — 42.2 48 386 


t= = 
у 186° — (235)2/5 — (633)?/15 20 V1.546 
18 5x15 


Confidence limits for p,» can be obtained by an application of 
the non-central t distribution. This is the distribution of ¢ when 
the mean of t, E(t), is not equal to zero. “Student’s” distribution 
may be regarded as central since it is the distribution of t when 


266 - Other Measures of Relationship 


E(t) =0. Applications of non-central £ including tables for its 
use have been given by Johnson and Welch (see reference 7). 


When pp = 0 the mean of t = муус is zero, hence ¢ has 

- rh 
* Student's" distribution. When рз“ 0 the mean of t = зул 
—T pb 


is not zero but depends on pp», hence the distribution is non-central. 
For each value of pp, t has a separate distribution. If a value of r 
has been observed the set of non-central distributions make pos- 
sible the calculation of confidence limits. (For a fuller discussion, 
see reference 12.) 

When the Johnson-Welch tables are not available, and when 
N is not less than 20, an approximation using tables of normal 
probability will give fairly satisfactory results. Suppose an in- 
terval with confidence coefficient 1 — о is sought. 


i. Tov М -2 


First compute [2—————. 
МІ - т, 


Then compute 


È 
(11.10) d=t+ aV sy aq tl 
ё 
(11.11) апа 0 = 1+ а-у. + 1 


Then the required interval is 


(11.12) и 


The reader should be warned that the approximation becomes 
inaccurate if confidence coefficients of less than .95 are used. How- 
ever the Johnson-Welch tables make it possible to obtain confi- 
dence intervals at a variety of levels. 

From the point biserial correlation of — .13 obtained for 308 
Student nurses, previously described, we shall now make an 
interval estimate for pp». 

ae: 18V306 — 2.29 
Since ¢ has "Student's" distribution with 306 degrees of freedom 
when рь = 0, it is clear that the hypothesis pj, = 0 may be rejected. 


The Biserial Coefficient of Correlation - 267 


To calculate confidence limits with 1 — o = .95 we have 


(— 2.395 
i А Me e 
d, 29- 1.06\/ 14-75 4.26 


(7239) 
NERO EI КЕД 
geco 9+ 1.964/1 + КЕ 32 
— 4.26 - 32 
Then M < рь < = 
V308 + C 436) ^ ^"^ V308 + ( 32} 
or — .236 < рь < — .018 


The Biserial Coefficient of Correlation. А different approach 
to biserial correlation was made by Karl Pearson (reference 15). 
His method depends on the assumption that the dichotomous 
variable is actually continuous. Call this variable X and assume 
further with Pearson that the distribution of X is unit normal, 
that the regression of the continuous variable Y, on X is linear 
and that for each X the variance of Y is the same as for every 
other X. A further assumption will be made below that the joint 
distribution of X and Y is normal bivariate. 

Suppose that in the dichotomous variable a proportion p of 
the cases in the sample are in one class, and a proportion 4 = 1 — p 
are in the other. The corresponding situation on the continuous 
variable X is represented by locating a point A on the X scale 
such that an ordinate drawn at A divides the hypothetical normal 
distribution of the dichotomous variable into two areas with 
proportions p and g. We suppose that each individual in the 
sample has a value on the X scale, but that the only information 
available about an X value is that it is greater than or less than А. 

To illustrate this situation, suppose that we wish to compute 
the relationship between response to an objective test item and 
the total score on the test. For each person we know whether he 
answered the item correctly or incorrectly, and know his total 
test score. Under the assumptions made above, the person’s 
actual knowledge about the subject matter tested by the item 
has some value on the scale of the continuous variable X, but the 
only information available about his knowledge is that it exceeds, 
or is less than, the value of X at some point A on the X scale. 
That point can be determined by the proportion of persons who 
answer the item correctly and the tables of the normal curve. In 
actual practice the score corresponding to A is not needed, as will 
be shown in the following discussion. 


268 - Other Measures of Relationship 


In practice, we find the coordinates of two points on the line 
of regression of Y on X and use this information to estimate the 
correlation between the variables. One of these points has co- 
ordinates X; and Yi, which are the means for the two variables 
of those persons who answered the item incorrectly. Let q be 
the proportion and qN the number of such persons. Then 4 is 
represented by the area under the normal curve to the left of A. 
The X mean of that part of the normal distribution is given by 
the formula 


(11.13) Xi- 


©!г 


where u is the ordinate * of the unit normal probability curve at 
the point A. This ordinate can be read from Table III if the table 
is entered with the area between ordinates A and at the mean. The 
value of Y; is computed directly from the Y scores of the gN per- 
sons in this subgroup. 

The second point has coordinates Y, and Y; X, is the mean 
of that portion of the unit normal curve to the right of A. This 
portion has area p and mean 

u 
(11.14) X. $ 
The value of Y; is computed directly from the scores of the pN 
persons who answered the item correctly. 

The two points which determine the regression line are there- 


fore the points (- P. т.) апа (5 т.). The line passing through 


these two points is the regression line and its slope is 


(11.15) be =" Ly, yy pa 


UAN (- =) 
p q 
The slope of the regression line is also given by the expression 


(11.16) bias = Tois z 
Sz 


Since the distribution of X has been taken to be unit normal 
3, = 1 and so byi = Тыз8у. Equating the two expressions for the 
slope we have 


(У, ry a) 1 = ToisSy 


* Most texts use the letter 2 for this ordinate, but we Вауе already used 2 to repre- 
sent an abscissa of a normal curve. 


Sampling Theory of Biserial Correlation Coefficient - 269 


Therefore we have the formula 


(11.17) tro У-У дад 
ae u 
Under the assumption that the joint distribution of X and Y is 
normal bivariate rj; is an estimate of р. It can be regarded as an 
appropriate estimate only when the sample is large. 
We can also write ть, in the form 
Yi-Y p Y-F, а 
pe Tag CEU 
For item 3 of Table 11.1, p = .25 and u = .3177 


(11.18) Tois = 


An interesting — and disconcerting — aspect of biserial r is 
that its value may be numerically larger than 1. In fact, the 
range of its values is unbounded in either direction. 

Sampling Theory of the Biserial Correlation Coefficient. The 
exact sampling distribution for ть» is not known. Karl Pearson 
derived a formula for its standard error, but that formula involves 
the unknown parameter р and substitution of ть» for р is not 
satisfactory. 

An extensive study of biserial coefficients has recently been 
completed by В. Е. Tate. One valuable outcome of his study 
is a transformation for fsi, similar in purpose to Fisher’s logarithmic 
transformation of r to z in Table XII. When the dichotomous 
cut is so made that in the population P = Q = 0.5 (approximately) 
and when М is large, and ть is not close to — 1 or + 1, Tate shows 
that а quantity 2* (read “г star”) has a normal distribution with 


standard error 


1 
(11.19) би = VN 
The relation of z* to Tis is 
(11.20) z* = 1.0297 (logi, (1 + 8944755) — logo (1 — .8944rp:s) } 
The corresponding formula for the population is 
(11.21) ¢* = 1.0297(logi (1 + 8944p) — logi (1 — .8944p)} 


Table XII giving the transformation of to z may be used instead 
of making the direct computation from logarithms. To utilize 
this table we use the following transformations: 


270 - Other Measures of Relationship 


(11.22) т = .8944r,;, ог Toia = 1.1187 
(11.23) z = i(log. (1 +r) — log, (1 — r)] 
(11.24) 2* = .89442 ог z = 1.1182* 


Procedure for obtaining an interval estimate for р from ть: 

(1) Multiply ть» by .8944. 

(2) Read z corresponding to .8944r,;, from the table. 

(3) Multiply this value by .8944 to obtain z*. 

(4) Make whatever test of hypothesis or interval estimate is 
required for ¢*. 

(5) If an interval estimate has been made for ¢* and the cor- 
responding interval is required for p, multiply the upper 
and lower limits of the obtained interval by .8944 for 
values with which to reenter the table. 

(6) Using the values obtained in step (5) read the corre- 
sponding values of r from the table. 

(7) Multiply these values of r by 1.118 to obtain the limits 
for p. 


Example. Given ты, = .72 when № = 163 and N, = 185. Find an 
interval estimate for p with .95 confidence coefficient. Since 


p = 185/348 = .5316, 
it may be assumed that P is near enough to 0.5 to justify using the method. 


(1) .8944ny, = .6440 

(2) If r .6440, z = .765 

(3) 2* = .8944(.765) = .684 

(4) on = 1/V348 = .0536 
.684 — 1.96(.0536) < ¢* < .684 + 1.96(.0536) 
579 < t* < .789 

(5) 1.118(.579) = .647 
1.118(.789) = .882 

(6) If z = .647, r = .570 
If z = .882, r = .707 

(7) 1.118(.570) = .637 
1.118(.707) = .790 
Then .637 < p < .790 


Another aspect of the statistic ть;, which Tate investigated is its 
efficiency or the degree to which ть approximates p. The approxi- 
mation is good when p — 0 and increasingly poorer as it approaches 
— 1 or +1. The approximation is good when P=.5 and in- 
creasingly poorer as P is removed from 0.5. 


Fourfold Correlation - 271 


Comparison of Biserial (ть) and Point (ть) Coefficients. 
The following general comment is taken from Tate™: “It 
would seem from the evidence presented that point biserial r is 
in most cases the better coefficient to use." "The two coefficients 
may now be compared on several points. 

а. The statistical model. For ry; а normal bivariate universe is 
assumed. If this is the case, then the distribution of Y cannot be 
normal within the separate categories. For т» no assumption is 
made as to the distribution of the dichotomous variable, but gener- 
alization is made only to a universe of samples of size N having 
the same fixed number of cases № and N; in the dichotomous cate- 
gories. For rj it is assumed that Y is normally distributed within 
each X category and that the two Y distributions have the same 
variance. The validity of this assumption can be ascertained by 
testing the two distributions for equality of variance and testing, 
possibly only by inspection, the normality of each distribution. 

b. Range of values of т. For point biserial the range is from 
—1lto+1. For biserial r the range is unlimited. 

c. Sampling distribution and tests of significance. The exact 
sampling distribution of biserial г is unknown and significance 
can be tested only if the transformation to z* can be made. This 
is defensible only when P is near 0.5, М is large and рь is not near 
lor —1. The exact sampling distribution of point biserial is 
known and tests of significance and confidence intervals can be 
obtained for any value of р and any size of N. 

d. Use with other r’s in a regression equation. For biserial r 
such use is very dubious, for point biserial legitimate. 

e. Position of dichotomous cut. For either coefficient, better 
results are obtained when № = N; than when they differ in size. 

Fourfold Correlation. A measure of relationship may be 
needed in a problem where both traits are scored dichotomously. 
The chi-square test may be used in this situation to test for com- 
plete lack of relationship between the two traits, as was done in 
Chapter 4. However when the chi-square test indicates a signif- 
icant relationship it does not itself provide a measure of the 
strength of that relationship. р 

In Spaney’s study of the prediction of success of students in 
schools of nursing,” students entering a school of nursing were 
asked to mark items in a personality inventory and their responses 
were studied in relation to their subsequent completion of the 
course or withdrawal before the end of the course. Responses to 


272 - Other Measures of Relationship 


the item “Do you like performing on the radio” may be used as 
an illustration, 


Not 


Withdrawing withdrawing оба] 
Some liking for performing on radio 29 126 155 
No liking for performing on radio 9 15 24 
Total 38 141 179 
x? = 4.38 Vx? = 2.09 


The chi-square test indicates that at the .05 level we must 
reject the hypothesis of independence between liking for perform- 
ing on the radio and remaining in a school of nursing. If then the 
traits are not to be assumed independent, how strong is the 
relationship between them? 

Two methods of correlating dichotomously scored traits will 
be discussed. These methods are parallel to the two biserial 
coefficients. 

The Phi Coefficient. As for the dichotomous trait in the point 
biserial coefficient, arbitrary values may be assigned to the two 
categories into which the X-scale is divided and also to the two 
categories into which the Y-scale is divided. For this purpose 
any numbers whatever may be used and the final outcome will 
be the same, but the work is minimized by using 0 and 1, as was 
done for the point biserial r. Let a, b, c and d represent fre- 
quencies, as in the formula for chi-square. The application of 


0 1 


1 a+b 
0 с+4 
b+d a+c 


the product moment formula gives the correlation between the 
two traits as 


"n ad — bc 
Vla + b) (c + d)(a + c) (b 4- d) 
Thus for the Spaney data, ф = .157. 
(11.26) xt = Мф 
where x? is the statistic already given by Formula (4.7). Hence 


(11.25) 


the significance of $ is tested by referring Nọ? to a chi-square. 


+ 


Tetrachoric Correlation · 273 


table with 1 degree of freedom or by referring VN¢? = ФУМ to 
a table of normal probability. In other words, if the null hy- 
pothesis for x? is justified, the null hypothesis for ¢ is justified 
and vice versa. 

No way of finding confidence limits for ф is known. 

Yates’ correction should be applied to the numerator of Фф 
whenever it is needed for the numerator of x’. 

In order to have the positive value of ¢ indicate a positive 
relationship, the table should be so set up that а and d represent 
the frequencies of individuals who possess both traits or neither 
trait while b and с represent the frequencies of individuals who 
possess one trait and not the other. (If a and d are used to repre- 
sent frequencies of individuals possessing one trait and not the 
other, then the numerator of ¢ should be written bc — ad.) 

The ¢ coefficient is a product moment correlation. 

In making item analyses it is often convenient to divide the 
criterion group into two equal parts, so that a + c = Ъ+4 = Л. 
This arrangement reduces the formula for x* and for ф to very 
simple special formulas which permit considerable saving of time 
when each of many dichotomous items is to be related to the same 
dichotomous criterion. Under these circumstances 
" а—Ь к. а-с 

V(a--b)(c--d)  v(a- b)(c- d) 

N(a-5? _  N(d-o* 
(11.28) and X’ = tr pcd) - (a4 bc d) 
where c + d is the number of individuals failing the item, a + b 
is the number passing the item, a is the number in the upper cri- 
terion group who pass the item and b is the number in the lower 
criterion group who pass the item, d is the number in the lower 
criterion group who fail the item, and c is the number in the up- 
per criterion groups who fail the item. 

Tetrachoric Correlation. This coefficient is developed on the 
assumption that both variables are continuous and that the joint 
distribution of the variables is normal bivariate. From this point 
of view the fourfold may be regarded as a scatter diagram divided 
into four quadrants by lines parallel to the coordinate axes. The 
specific X and Y values of the observations are unknown, all that 
is known is the frequency, or number of observations, in each of 
the four quadrants. From these frequencies the value of p in the 

. normal bivariate population is estimated. 


(11.27) $ 


274 . Other Measures of Relationship 


In his original paper on the tetrachorie coefficient of correla- 
tion Karl Pearson * derived some extremely complex formulas 
for computation of this coefficient. These formulas will not be 
reproduced here. However, some approximations also derived by 
Pearson will be presented. 

When both variables are split approximately at the median, 
that is when a--b = c-- d and a-c- b - d approximately, then 
7, the tetrachoric coefficient is given approximately by the formula 


(11.29) r= sin (o ere bao 


Formula (11.29) is exact when the relations a -- b — с d and 
& t c — b + d hold exactly. 

Under other circumstances an approximation to ғ; is obtainable 
from the formula 


(11.30) T, = Cos {180° Ув ) 


Vad + уђе 

A series of very useful diagrams from which the value of tetra- 
chorie r could be read by graphical interpolation were published 
by Chesire, Saffir and Thurstone but are now out of print. 

Comparison of Phi and Tetrachoric r. Since phi and tetra- 
сһогіс r are both used to compute relationships in a fourfold, or 
doubly dichotomous, distribution some discussion of the relative 
merits of the two methods is called for. 

The phi coefficient is preferred: 

1. Because phi can easily be computed for all distributions. 
Tetrachoric г has meaning only for large samples. It is difficult 
a+b 

N 


to compute т, when one of the marginal proportions such as 


is small. 

2j Because the phi coefficient readily provides a test for the 
hypothesis of independence through x?. Use of r, for this purpose 
presents almost insurmountable difficulties. 


3. When the traits are logically dichotomous rather than 
continuous. 


The tetrachoric coefficient is preferred: 

1. When the traits involved in the relationship may logically 
be assumed to be continuous. 

2. Because tetrachoric r provides an estimate of p. Phi is 
not an estimate of a parameter. 


Estimating Correlation · 275 


3. Because tetrachorie r ranges from — 1 to + 1 regardless of 
the marginal frequencies. On the other hand, the values of phi 
are restricted by the marginal frequencies. Consider, for example, 
the fourfold distribution with marginal frequencies 100, 100, 
160, 40. For this fourfold, r, can take values from —1 to +1, 
whereas phi is restricted to the range — 4 to +4. A situation such 
as that indicated in the fourfold may arise in correlating responses 
to an item with test scores using high-low halves. The range of 
phi is always restricted when the ratio of marginal frequencies 
for one variable differs from this ratio in the other variable. 

Estimating Correlation from the Tails of the Distribution. 
A time-saving method devised by Flanagan * is useful for estimat- 
ing relationship when one variable is dichotomous and the other 
continuous. Such a situation arises in correlating a test item 
with scores on a test. The scores on the test are arranged in 
order of size and the upper and lower 27% are selected for further 
computation. The middle 46% are put aside. From the propor- 
tion of correct responses in each of the high and low 27% groups 
a measure of relationship is obtained. 

The coefficient calculated in this way is an estimate of p in a 
bivariate normal population. The dichotomous trait is assumed 
to be actually continuous. However, the only information avail- 
able is that given by the frequencies in the four corners of the 
hypothetical scatter diagram. The estimate of p obtained in this 
way has been shown by Kelley ° and by Mosteller,? to be more 
efficient than tetrachoric r, at least when p is zero. Though 
biserial r can be used in this situation Flanagan's method is very 
much faster. 

Table XIII (prepared by Flanagan) provides a ready means of 
computing the correlation coefficient when the percentages in the 
upper and lower 27% groups are known. In analyzing test items 
it is convenient, when possible, to select a sample of N — 370 cases 
because 27% of 370 = 100. The upper 100 and lower 100 of the 
papers in the sample are then used for further computation. For 
each item the per cents correct in the two selected groups are 
used for estimation of p. Е 

In Chapter 16 a method analogous to the one just described 
is used for estimating p when both variables are continuous. 
Though this method is less efficient than product moment r it is 
much faster to use; so it is sometimes advisable to use a larger 
sample and an inefficient method of estimation. 


276 - Other Measures of Relationship 


The Correlation Ratio. The correlation ratio is a measure of — 
relationship which is useful in two circumstances: 

1. When both variables are continuous but the regression is 
not linear. This situation is illustrated in the relationship be- 
tween age and IQ in Table 11.2, page 279. 

2. When one variable is continuous and the other is discrete. 
Problems of this sort were considered in Chapter 9 under the head- 
ing of analysis of variance. 

The development of a formula for the correlation ratio is 
similar to its development for the product moment coefficient г 
in Chapter 10. By simple algebra the following relationship can 
be derived from Formula (10.4) 


xy н | z(Y-Ty 
(N - Ds? ZU y 


In this formula Ў is given by a linear regression equation. When 
linear regression cannot be assumed the estimate Y is no longer 
applicable. Instead, the mean of a column of the scatter diagram 
is the best estimate of the scores in that column. Call the mean 
of the jth column Y; and replace Y in Formula (11.31) by Yj. 
Designating the correlation ratio of Y on X by E,,, we then obtain 
the formula 


(11.31) Peal 


(11.32) Beats co 


P PE + Y 


If both variables are continuous, there are two correlation 
ratios which in general are not equal. By analogy the correlation 
ratio of X on Y is Ё„„ where 

k 
У ХУ (Xn- X,)? 
(11.33) ff eee Nee a C N 


Here X; is the mean of the jth row, and Ё is the number of rows, 
which is not necessarily the same as the number of columns, 
From the definition of the correlation ratio the following char- 
acteristics of this coefficient can be determined. 
1. If the means of all columns are equal to each other, and 
therefore to the general mean, E?,, = 1 — 1 = 0. 


The Correlation Ratio - 277 


2. If the means differ greatly from each other and the obser- 
vations within a column are very close to the mean of the column 
E?,, is close to one. 

3. The square root of E’, is always taken positively, therefore 
E, ranges from zero to one. 

An alternative formula for E?, is obtainable through the sub- 
division of sum of squares as in analysis of variance. 


k Ni k м ius «M 
(11.34) P }, (Үз —Yy ne È (Үз —Ў)#+ УМУ, - Y)? 


Appropriate substitution in Formula (11.32) leads to the formula 
for E*,, 

=N,(Y;-Y)?_ =NA¥i- У)* 
11.35 Е = ———— = 25-2 
( ) y: (N Er Туу ZZ(Y, т У) 
From Formula (11.35) it сап be seen that E’yz is the ratio which 
the sum of squares of column means around the total mean bears 
to the total sum of squares of scores around the total mean. A 
convenient formula for computing Ё*, is 


к Тя OT 
(11.36 т ENO N 
.36) yr = (N = 1)з,* 
Ni k 
where T;-) Y; and T,= УТ, 
а=1 ј=1 


If data are grouped in the form of a scatter diagram and scores 
are coded, Formula (11.35) is modified by use of the following 
notation: 

N; is the frequency in a cell in the ith row and jth column. 

№, is the total frequency in the ith row. 

N; is the total frequency in the jth column. 

N is the total number of cases in the sample. 

x’ andy’ are coded X and Y scores as explained in Chapter 10. 


k * 
Т; = > №, is the sum of д’ scores in the ith row, each 
ј=1 


multiplied by its appropriate frequency. 
h . +. 
Ту = Ў Nay: is the sum of the у’ scores in the jth column, 
і=1 


each multiplied by the appropriate frequency, and h is the number 


of rows. 
Tf = ZT; Tf e ZT; 


278 - Other Measures of Relationship 
Then we have the following formulas 
х (12) (yy 


N N 
(11.37) ЕЁ? = И 
ZN ку: — EN 
and 
CY (Quy 
N; N 
(11.38) Е, = 


ХМ (ху) — ( E 


Computation of E*., is illustrated in Table 11.2 which shows 
the joint frequency distribution for age and intelligence quotient 
of 109 pupils in a fourth grade. The arbitrary origin for age has 
been placed at 96 months, with step intervals of 5 months. The 
arbitrary origin for intelligence quotient has been placed at 66 with 
step interval of 7. 

'The population values of the correlation ratios are indicated 
by the symbols 7,2 and т. (n is the small Greek letter eta.) А 
test for the hypothesis 7, = 0 is given by the F ratio 


E ue 
(11.39) а. | 
with m=k-1 and п, = № — Е degrees of freedom. This is 
identical with the F test for the hypothesis ш = us = +++ = Me 


described in Chapter 9. This is logical, for т = 0 if the column 
means are equal in the population. 

The test for linearity of regression is the test that Ту = p and 
is made by computing the ratio 


Е. —12 Nk 
Tis pria оа A READ C 
(11.40) о 


with degrees of freedom m = k — 2 апат, = N — Б. This test is de- 

rived from the analysis of the total variance into three independent 

components shown as follows together with the degrees of freedom. 

(11.41) зу = 18! + (Hye — т?) sy? + (1 — Е), 
N-1-1 +(k-1-1) +(N-4) 

Rearranging the order of columns has no effect on E,» but of 
course may change r strikingly. Similarly rearranging the order 
of rows has no effect on Е... 

Correlation Among Ranks. There are two situations in which 
it may be expedient to work with ranks rather than scores and to 


Correlation Among Ranks · 279 


TABLE 11.2 Computation of the Correlation Ratio of Age 
on Intelligence Quotient for 109 Fourth Grade Children 


Age to Nearest Month 
со оо со 00 c» OD 


8 
3 
8 


e со 00 
ее сјае ара 
2diiiiiiiilil 
e e EUH EUG rir. y T/ (Ту М; 
147-153 | JI 1 12 1 1.00 
140-146 31 Es] 14 11 9 20.25 
„ 133-139 2|2 т) 10 25.00 
‚8 126-132 |1|4|4|1 || 10 9 25 62.50 
$ 119-125 [2]1/4|7/6 20 8 54 145.80 
@ 112-118 | 1/4] 5/5] 1 Ы 16 7 49 150.06 
8 105-111 3/5|6| |1 15 6 51 173.40 
$ 98-104 3/5|4|12 15 5 69 317.40 
Я 91-97 1|2|2|1|1|i1|1| | | 9 4 42 190.00 
= 8-90 ANARE Eom ten 115.20 
77- 83 1 ЕЙ Г 6 2 37 228.17 
70- 76 1| 1 T 12 144.00 
63- 69 ШЕ 3 0 30 300.00 
есч о оа TNCERED) 418 1878.78 
= AON 
gS Set соо 
эзочананаияяняя 
= x Я 
очен 
= сз ч баса сч == 
= 
(413)? 
m 57878 — 10 _ (1878.78)(108) — (413) _ ggg 
ти (413: — (2035)(109) — (413)* 
2035 — =т09- 
E, = V.668 = .817 


Tey = — 123 
ae ul ee SS ЕЕС 


seck a measure of relationship among ranks. The first situation 
arises when there is no satisfactory device for scoring the trait in 
question but the individuals can be placed in a rank order in re- 
spect to the degree of the trait they exhibit. In this situation it 1s 
possible to say for any two individuals which one is higher on the 
scale for this trait but not possible to say how much higher. The 
reader will readily think of many situations of this kind, as e.g. 
the attempt to place in order of merit the performances of a num- 
ber of contestants for an award in an area where the criteria 


280 - Other Measures of Relationship 


cannot easily be reduced to quantitative terms, or to place in- 
dividuals in order in respect to their possession of some intangible 
such as, “attractiveness,” “sense of responsibility," “courtesy,” 
and the like. If the group is large, determining the order of 
individuals becomes very difficult, so the rank order correlation 
is obviously more useful for small samples. 

A second situation arises when traits which can be measured 
on a scale are recorded as ranks instead. The chief occasion for 
doing this is when the distribution of scores is obviously not normal 
and a measure of relationship is sought which does not depend for 
its validity upon the assumption of a normal bivariate universe. 
The advantage of the rank order correlation coefficient in this 
situation is worthy of considerable attention. 

Nearly all the tests of significance commonly used are derived 
on the assumption that sampling is from a normal universe. In 
the case of the mean, those tests are still valid even when the 
universe is far from normal for the distribution of the mean rapidly 
becomes normal as N increases (except for a very special case 
which the theoretical statistician likes to talk about but which 
is not at all likely to be encountered in practice). The sampling 
distribution of the variance and of the correlation coefficient are 
seriously disturbed by laek of normality. If a population is bi- 
1-p 
VN 
bivariate normal that standard error may be quite different, but 
how much different is usually unknown. 

The formula for rank order correlation, given originally by 
Spearman 18 is 


variate normal, the standard error of ris с, = but if it is not 


62d? 

(11.42) Ret Wy 
where N is the number of individuals ranked and d is the difference 
in the ranks assigned to the same individual. In the computation, 
a useful check is provided by the fact that Ха = 0. This formula 
is derived by applying the usual product moment formula to the 
ranks (see reference 6). 

Suppose 12 cakes submitted in a competition at a country fair 


have been ranked by two judges with results as shown in Table 
11.3. Then 


danda = 425 = 56 


Test of Significance for Rank Order Coefficient - 281 


TABLE 11.3 Computation of Coefficient of Correlation Among Ranks 
Assigned to 12 Cakes 


Cake Rank Assigned by 


Judge I Judge П d d 

A 7 6 1 1 
B 8 4 4 16 
с 2 1 1 1 
р 1 3 =2 4 
Е 9 11 => 4 
F 3 2 1 1 
G 12 12 0 0 
H 11 10 1 1 
1 4 5 zd 1 
J 10 9 1 1 
K 6 7 eu 1 
L 5 8 -3 9 
Dd? = 40 

6(40) 20 _ 123 _ 
R=1- aq) 18 ИЗ, 5° 


Now let the ranks assigned by Judge I be denoted X and those 
assigned by Judge II be denoted Y. Then 


УХ = 78 SY -78 
DX? = 650 SY? = 650 
тё bel 
Za = 650-18 -143 xy = 650 - Tz = М8 
IXY = 630 Say = 630 — ee - 123 
193 123 


ала у(143) (143) 148 ` 

This computation illustrated the fact that the rank order coeffi- 
cient and the product moment coefficient applied to the ranks are 
identical. However, if scores are transformed to ranks — a pro- 
cedure which Hotelling calls uniformizing the distribution — the 
product moment correlation of the ranks (which is the rank order 
coefficient) is almost certain to be different from the product 
moment coefficient of the original scores **. 

Test of Significance for Rank Order Coefficient. For very small 
samples from a bivariate population in which the variables X.and 
Y are uncorrelated so that pzy = 0, the exact distribution of the 


282 - Other Measures of Relationship 


rank order coefficient can be obtained by direct enumeration of 
the number of permutations of the ranks which would produce 
each possible value of the correlation. The distribution of R is 
discrete and is bimodal in small samples. A table obtained in this 
way, giving the probability distribution, not for В, but for Dd? has 
been computed by Kendall ^" with N € 8. Obviously, the work 
of direct enumeration, while possible, becomes very laborious for 
larger values of N. (For N =8 the number of permutations is 
8! = 40320. For N = 9 it would be 9! = 362880.) Using this table 
the values of R required for significance at a given level, from 
samples of a given size, may be computed. Such values are given 
in Table XVI in the Appendix. 

Comparison of the entries in Table X VI with the corresponding 
entries in Table XI (where N = n +1) shows (at least for these 
small values of N) that the rank order correlation must be larger 
than the product moment correlation to achieve the same level 
of significance. 

For larger samples, an approximation is needed. If p= 0, 
the standard error is 


(11.43) св = 


and the sampling distribution of R approaches the normal form 
as N increases. If N is as large as 25, 


(11.44) z=RVN -1 


may be safely referred to tables of normal probability. How to 
deal with samples larger than 8 and smaller than 25 is still a prob- 
lem. For this situation the coefficient 7 mentioned later in the 
chapter may be used. How to test hypotheses other than p = 0 
is also not known. 

Estimation of Product Moment Coefficient from Rank Order 
Coefficient. The relation between the product moment correlation 
and the rank order coefficient was derived by Karl Pearson and 
later confirmed by Hotelling.^ It is 


(11.45) P-2sincR 
Then 

11.46 ау 
( ) а^ = oN 


Relation Among Ranks Given by Several Judges · 283 
is the variance of 7. Since the sampling variance of the product 


; when р = 0, the variance of 7 is 


moment r is c, = : 
" N-1 


(11.47) e? = T. 62 = 1.09702 


The meaning of this formula is that if in a sample of 100 cases from 
a normal bivariate population, the product moment coefficient is 
significant, it will require a sample of 110 cases to achieve the same 
degree of significance for the rank order coefficient. Hence the 


rank order coefficient has efficiency I 255014 


Formula (11.45) has often been used to “convert the rank order 
coefficient to a product moment coefficient." Unfortunately, it 
cannot perform that magic and its usefulness is somewhat dubious. 
A continuous variate is really not susceptible to ranking. If 
scores are available, the procedure of transforming them to ranks, 
computing the coefficient of rank order correlation, and then 
applying Formula (11.45) to attempt to estimate what the product 
moment coefficient might have been had it been computed, has 
nothing whatever to recommend it. Usually it does not even 
save time. 

Relation Among Ranks Given by Several Judges. Suppose 
that 8 individuals have been ranked by 4 different judges who may 
be called P, Q, В, and S asin Table 11.4. А single measure of the 


TABLE 11.4 Computation of the Coefficient of Concordance from Ranks 
Assigned to 8 Subjects by 4 Judges 
—— T ee EE ЕЕ Е 
enby Sum of 


Subject E А ers Square Judges R 
A SUC EX d 11 121 P,Q 34 = 162 
В 21. Ар 8 64 Р,Е = 714 
[^] 3. 9142.80 14 196 TS В = .738 
D 4.2 П 32, 9 81 Q, R 35 = .738 
Е 5.5 м 5 22 484 Q, 5 8 = 810 
F 6 6 5 4 21 441 R, 8 3$ = .610 
а Vineis ví 28 784 Sum 188 = 4.381 
Н S2 jensen 31 961 Mean = .780 
144 3132 
144)? 12(540) _ 45 _ 
s-s C m т тау a 96 


z 46-1 31 _ 
jj eie = 25 .780 


rof M Е 


284 - Other Measures of Relationship 


general agreement among all four judges is desired. Probably it 
would occur to most persons that the correlation of each judge 
with every other judge might be found and the average of all such 
coefficients taken. This has been done for the four judges of 
Table 11.4, with results recorded at the right of that table. The 
mean of the six rank order coefficients is В = .730. If there are 
m judges the number of the coefficients which must be computed 
in order to find В is m(m — 1)/2. 

Kendall has proposed a coefficient of concordance among rank- 
ings and has obtained an approximation to its sampling distribu- 
tion", This coefficient of concordance has a linear relation 
toR. 

The steps in the computation of the coefficient of concordance 
are as follows: 

1, Find the sum of the ranks given by the m judges to each 
subject. In Table 11.4 these sums are in column 6. 

2. Verify that the sum of these sums-of-ranks is mN(N + 1)/2. 
In Table 11.4 this is 4(8)(9)/2 = 144. 

3. Find the mean of these sums-of-ranks. Here this is 18. 

4. Obtain the sum of the squares of the deviations of the m 
sums-of-rank around their mean and call it S. For the data under 
consideration, the deviations of the entries in the sixth column 
from 18 which is their mean, are respectively 


— 1, — 10, — 4, — 9, 4, 3, 10, 13 
and the sum of the squares of these eight deviations is 540. 


Alternatively, 3132 — o = 540. 


If agreement among the ranks were perfect, the value of S would be 
тё М (№ — 1) > = ay 
т The student can verify this by writing down а 
set of ranks, duplicating it several times, and computing S. The 
measure W is defined as the ratio of the observed 8 to the value S 
would have if there were perfect agreement among the several 
rankings: 


11.48 „ч: уур. 
iim) И = ат 1) 


is the coefficient of concordance. It may have values ranging from 
0 to 1 but it cannot be negative. For the data under discussion, 


Relation Among Ranks Given by Several Judges · 285 


W = 25 = .804. Then В, the average rank order coefficient among 
all possible pairings of the judges, is related to W by the formula 


5 mW-1 
(11.49) R= aT 
x 45) — 
For these data R = ач = КЕ = .730 which agrees perfectly 


with the value obtained previously as mean of six computed co- 
efficients. 

For small values of m (the number of judges) and N (the num- 
ber of subjects judged) the exact probability distribution of S 
can be obtained by enumerating all possible permutations of the 
N ranks for the m judges. Kendall has done this and furnishes a 
table (page 412 of his Volume I) showing the probability that a 
given S will be attained or exceeded for N =3 and m = 2, 3, 
... 10. He also gives a table for N = 4, m = 2, 3, 4, 5, or 6, 
and for М = 5, m = 3. For larger values of N and m the F test 
may be used: 

(m —1)W 
(11.50) Ё = ГҮ ЕКПЕ 


with degrees of freedom m = № — 1 — 2 


and m= (m -D(N-1-2) 


If № and mare small, the correction for continuity should be made 
by subtracting 1 from S and increasing the divisor of W by 2, so 
that 

- 12(5 -1 
(15) W’ Ses Cis 


т? М(№ — 1) © miN(N? — 1) + 24 
р +2 


For the data of Table 11.5, the test of significance would be 


12(540 — 1) " 
Й NE ГАРТ 
W' = 1608) (68) + 94 7 
3W' 
1- W' 
Reference to the F table for n; = 6, огт = 7 and m = 19 or m = 20 
shows that the largest of the values in the four cells indicated is 
F» = 3.94. The observed F is even larger than that value 80 that 
interpolation is needless. There can be no hesitancy in stating 


and ie = 11.97 has m, = 64, m = 19.5 


286 - Other Measures of Relationship 


that the four sets of ranks show agreement far beyond what 
might be produced by sampling variance. 

For such data what is the “best estimate" of the “true rank- 
ing”? These data have shown a significant concordance. Let it 
be assumed that the relations among the rankings reflect the true 
ranking, which of course may not be the case. Then the ranking 
of the sum of the ranks is the “best” ranking in the least square 
sense (see page 421 of reference 10). For the data considered, this 
would make the “best ranking” B, D, A, C, F, E, G, H. If the 
data had not shown significant concordance among the various 
rankings, then no composite ranking could be considered 
meaningful. 

The Tau Coefficient. Kendall (see references 10 and 11) has 
proposed a coefficient called т (tau) which provides an alternative 
method of measuring relationship between ranks. While not quite 
so simple to compute as Spearman’s coefficient of Formula (11.42), 
т has the great advantage that its sampling distribution is known, 
especially in that area for moderate sized values of N larger than 
10 and too small for use of the normal approximation, where there 
is no very good method of testing the rank order coefficient. This 
measure will not be described here because to test its significance 
the tables Kendall has computed are needed, when N is small. 

Relation Between Two Traits Expressed in Qualitative Cate- 
gories. The existence of relation can be tested by the chi-square 
test, but its measurement is not always possible without additional 
assumptions. 'The case in which there are just two categories for 
each trait has been diseussed in an earlier section of this chapter. 
If for each trait the qualitative categories can be placed in à 
meaningful order, it may be possible to assign numerical scores to 
the categories and compute a product moment correlation with 
these scores. One assumption which might be used to obtain 
such scores if it appears reasonable is that the various categories 
are spaced at equal intervals along a scale, so that the numbers 
0, 1, 2, 3, 4, . . . may be assigned to the categories. Another is 
that each trait has a normal distribution. Ranks may then be 
normalized as described in Chapter 17. 

If the categories for either variable do not have a meaningful 
order there is no satisfactory way of measuring relationship, 
although x? can be computed to test the hypothesis that no re- 
lationship exists. If the categories for both variables can be ar- 
ranged in meaningful order and if x? is large, so that the hypoth- 


References - 287 


esis of independence of the two variables is rejected, the research 
worker is likely to feel an intense need of some measure of relation- 
ship. For this situation Karl Pearson Ч proposed the coefficient of 


contingency 
Gy = 
ем 


This is not а very satisfactory measure of relationship but under 
the circumstances no better measure is available. If x? = 0, then 
C = 0, but that situation very seldom occurs. C cannot be nega- 
tive. Its maximum value for a table of k rows and k columns is 


Imagine 200 cases distributed in 4 rows and 4 columns 


with 50 cases in each diagonal cell and all other cells empty. The 
200 scores could not be distributed in a way to indicate higher 
relationship. For this table 


1-8 а SDN Ce 
х? = 600 and Ce MELE 2 .87 not 1.00 


REFERENCES 

1. DuBois, Philip H., “A Note on the Computation of Biserial r in Item Valida- 
tion," Psychometrika, 7 (June 1942), 143-147. 

2. Dunlap, Jack W., “Note on the Computation of Tetrachoric Correlation," 
Psychometrika, 5 (June 1940), 137-140. 

3. Dunlap, Jack W., “Note on Computation of Biserial Correlations in Item 
Evaluation,” (Including Table of p and p/z, basic data sheet, and nomograph), 
Psychometrika, 1 (June 1936), 51-60. 

4. Flanagan, John C., “General Considerations in the Selection of Test Items 
and a Short Method of Estimating the Product-Moment Coefficient from the 
Tails of the Distribution,” Journal of Educational Psychology, 30 (1939), 
674-680. 

5. Hotelling, Harold and Pabst, Margaret Richards, “Rank Correlation and 
Tests of Significance Involving no Assumption of Normality,” The Annals 
of Mathematical Statistics, 7 (March 1936), 29-43. 

6. Jackson, Dunham, “The Algebra of Correlation,” The American Mathematical 
Monthly, 31 (March 1924), 110-121. 

7. Johnson, N. L. and Welch, B. L., “Applications of the Non-Central ¢-Distribu- 
tion,” Biometrika, 31 (1940), 362. 

8. Kelley, Truman L., Fundamentals of Statistics, Cambridge, 1947, Harvard 
University Press. (See Index for references to many topics treated in this 
chapter.) 

9. Kelley, Truman L., “The Selection of Upper and Lower Groups for the Valida- 
tion of Test Items,” Journal of Educational Psychology, 30 (1939), 17-24. 

10. Kendall, Maurice G., The Advanced Theory of Statistics. Vol. I. Philadelphia, 


1943, J. B. Lippincott Company. : 
11. Kendall, Maurice G., Rank Correlation Methods, London, 1948, Charles Griffin 


and Company, Ltd. 


288 - Other Measures of Relationship 


12. 


13. 


14. 


15. 


16. 


17; 


18. 


19. 


20. 


Lev, Joseph, “The Point Biserial Coefficient of Correlation," Annals of Mathe- 
matical Statistics, 20 (1949), 125-126. 

Mosteller, Frederick, “Оп Some Useful Inefficient Statistics,” Annals of 
Mathematical Statistics, 17 (1946), 377-408. 

Pearson, Karl, “On the Correlation of Characters Not Quantitatively Measur- 
able,” Philosophical Transactions, Series A, Vol. 195, (1901) 1-47. 

Pearson, Karl, “On a New Method for Determining the Correlation Between 
a Measured Character A and a Character B, of Which only the Percentage 
of Cases wherein B Exceeds (or falls short of) a Given Intensity is Recorded 
for Each Grade of A,” Biometrika, 7 (1909), 96-105. 

Pitman, E. J. G., “Significance Tests which may be Applied to Samples from 
any Populations, Part II, The Correlation Coefficient Test,” Journal of the 
Royal Statistical Society, Supplement 4, 255. 

Spaney, Emma, “Personality Tests and the Selection of Nurses,” Nursing 
Research, 1 (Feb. 1953) 4-26. (Data used here were taken from the original 
manuscript of which this paper is an abridgement.) 

Spearman, Charles, “The Proof and Measurement of Association Between 
Two Things,” American Journal of Psychology, 15 (1904), 72-101. 

Tate, R. F., “The Biserial and Point Correlation Coefficients,” Institute of 
Mathematical Statistics, University of North Carolina, Mimeographed 
Series #14, (for limited distribution). A special report of research under Office 
of Naval Research Project N R 042031. 

Wallis, W. A., “The Correlation Ratio for Ranked Data,” Journal of the 
American Statistical Association, 34 (1939), 533-538. 


12 The Statistics of Measurement 


All measurement is infested with error. The physicist, 
the engineer, the astronomer, the surveyor, as well as fhe edu- 
cator and the psychologist are vividly aware of this liability in 
their data and aware of the importance of holding measurement 
error to a minimum and of estimating the effects of such error 
as cannot be controlled. In the preceding chapters of this book we 
have apparently assumed that observations could be taken at their 
face value and could be treated as perfectly accurate measures of 
individuals on whatever traits they purport to measure. The 
discussion in earlier chapters of this text has proceeded almost as 
though the only important uncertainties in statistical studies were 
uncertainties about population parameters when these are es- 
timated from sample statistics. However, in the social sciences 
the variance due to errors in measuring the individuals selected 
is often larger than the variance due to sampling errors in selecting 
individuals. Furthermore while sampling errors have a random 
effect upon the statistics computed, measurement errors affect 
certain statistics in a systematic not a random fashion, and so 
cannot be safely disregarded. If measurement error has remained 
unnoticed in the preceding chapters that has been done for the 
sake of the reader, to allow him to grasp one thing at a time. 

The discussion in the present chapter will be directed toward 
answering such questions as the following. If the same individuals 
take two forms of a mental test and the correlation between the 
two sets of scores is found to be only, say, .60, will the test be 
useful in a situation where predictions are to be made concerning 
the performance of individuals? Will it be useful in a study con- 
cerning group performance? How does the variability among 
scores on one test form condition the answers to these questions? 
If the test can be lengthened, how will that affect the correlation 
between forms? Is it feasible to make the test long enough to 
yield a correlation of .90 between two forms? What is the differ- 
ence in the information provided by the correlation between the 
results of giving the same test twice, of giving two test forms on 


290 - The Statistics of Measurement 


two different days, of giving two test forms on the same day, of 
giving one test form and correlating scores on chance halves? If 
the correlation between two test forms is low, how will that fact 
affect a test of significance employing that test as a measure of 
the trait concerned? 

Symbols. In this chapter we shall be concerned with multiple 
measures made on more than one trait for many individuals. It 
will be necessary to distinguish the trait, to distinguish the in- 
dividual, to distinguish the measure. In order to be unambiguous 
the symbolism must be explicit, and consequently somewhat 
cumbersome. 

The letters, X, Y, Z, W will be used to name the traits (the 
variates). Thus tests on American history might be denoted X 
and vocabulary tests denoted У. 

Xia will indicate the score on trial h for individual a. The 
first subscript will represent the test form used or the repetition 
made. Thus X; might mean the score made by individual 7 on 
the third form of test X, or it might mean the score made by 
individual 7 on the third trial of task X. 

In Table 12.1 are displayed the symbols for mN scores of N 
individuals on m forms (or m trials) of test X , together with the 
various means and variances. For the sample of N individuals 
the means and variances of the different test forms present no new 
concept. The set of trials furnishes a sample of m observations 
of the performance of each individual with mean and variance 
as indicated. 

The student should note that the word “test” as used in the 
present discussions connotes some measure of an individual and 
is not related to “test of significance” as used elsewhere in the 
text. 

For the universe of individuals the mean for trial h is the ex- 
pected value of X,, summed over individuals. This concept is 
thoroughly familiar but the subscript @ will now be attached to 


the symbol E to indicate that summation is over individuals not 
trials, thus 


(12.1) ix EX) 


For the universe of trials for individual æ the mean is the ex- 
pected value of X,, summed over the trials. This value will be 
denoted “tau sub alpha.” 


(12.2) Ta = Ej(Xia) 


Symbols · 291 


Е — u У 
ii m'y =” НИ 
Tet". 
ш S 
("хуя =% о 
а 
Ко. о Го SOUBLIE A 
NL. 95 tz чвәүү 
seng jo 
әвләлти[ү 
NuS . tg lis 9ouvLIEA 
хх пен 
seu ш 
Jo э| Чиа 
РТМ. 
(“п — *“уузу = Uo m CX — 9X) < ея “з “Х чу му е ш 
* 
Te» р 
set —my)go«o ш CEN) (с=ш= | eK reet e Эх z 
& 
125 
О Етну = со Meam Сх "0 ! = оао 2p GND ШЧ I 
N 
Элизе, uve әәпви A uve ДЇ < I то] 
вүепрглїрш jo uorjeindoq хүепрілірш y jo ojdureg тепрілірит вә, 
ло PUL 
„= шш ШЕШ ы ы eee 
ры Aupw uo sjonpiaipuj uow 
JO} X чо 5245045 PUD $94056 jo uoupjuese1dey эцодшАс [°2| 319У1 


292 · The Statistics of Measurement 


the subscript h being used to indicate that summation is over the 
trials not the individuals. 

For the universe of individuals the variance on test form h, 
or trial h, will be denoted 
(12.3) on? = Ea(X ha — Ma)? 
while for the universe of trials for individual о the variance will 
be denoted 
(12.4) oe? = Ey(Xte — Ta)? 


The score on test form h (or trial h) made by individual a may be 
described by the equation 

(12.5) Xha = Ta + Cha 

Here ем is an error made in measuring individual @ on test form 
hand т. is the expectation of Ху for the individual, or the value 
around which the various measures of that individual fluctuate 
as they are affected by different kinds of measurement error. Ta 
is not amenable to direct observation but is surrounded by a 
kind of haze of measurement error in the same way that an in- 
accessible population parameter is shrouded in a haze of sampling 
error. 

The variance of Xj, from individual to individual is denoted 
o,. for the population, and 3%. for the sample, and is called an 
observed variance, being the variance of observed scores. If the 
different test forms are truly comparable measures of the same 
trait it may be assumed that they all have equal variance, so that 
(12.6) gh. = 0%, = = о? = 0° 
where o? is the expectation of the variance of all possible scores of 
all individuals around the population mean и. 

Even when Equation (12.6) holds, the variance of the different 
test forms need not be equal for a particular sample, 

8h. zb 8%. © 808, 96 55 8%. 
_ The variance of т from individual to individual, called с? 
for the population and s? for the sample, is the true variance 
among individuals. It is that component of the observed variance 
which is due to genuine differences among individuals 
(12.7) с? = Е.(т. — Ш)? 
There is no way to compute c; or s directly, as т cannot be 
measured directly. Methods of inferring o,? or s? indirectly will 
be discussed later in the chapter. 


Components of an Observed Score. The error component 
indicated as еы in Equation (12.5) can be considered as arising 
from several different sources. The performance of any individual 
fluctuates from time to time partly because of changes in the 
individual himself, and partly because of changes in the environ- 
ment. There are errors in the observer so that the same observer 
varies in his judgment from time to time and the judgments of 
different observers on the same performance are not uniform. 
The measuring instrument itself may be inconsistent. 

In some situations it is possible to make separate estimates of 
the variance due to each of the three kinds of error. However, 
in most situations they are hopelessly confounded and their effects 
cannot be disentangled. Each of them may involve either ran- 
dom or systematic error, or both. In practical problems it is 
essential that the investigator try in advance of gathering his 
data to foresee what will be the chief sources of error, not only 
in order that he may control them as much as possible, but also 
in order that he may use the appropriate method of measuring the 
effect of such error on the reliability of his observations. There- 
fore, before going further, it will be well to consider some of the 
most common causes which play upon the “real” component т 
and the error component е of an observed score. 

If the individual is a person taking a psychological test, T is 
affected by the level of his general ability, by his familiarity with 
such tests and his general ability to understand instructions, by 
his knowledge of the pertinent subject matter or his skill in the 
function tested, and the like. If the individual is a rat running 
a maze, 7 is affected by his learning ability, his maturity, his 
physical vigor, the keenness of his senses, his testwiseness, and 
the like. If the individual is a pipe, the internal diameter of which, 
is to be found, or a cement mixture tested for the pressure it 
will bear, 7 represents the average of all the large number of 
measurements which could be made, measurements which will 
differ one from the other because the object measured is not com- 
pletely uniform, the measuring instrument is not completely 
reliable, and the person making the measurement has certain 
human frailties of touch, vision, etc., which make his measure- 
ments not absolutely dependable. 

One component of error is due to the observer. Tf a test per- 
formance is to be timed, no human observer can be completely 
consistent in his timing even with the most accurate of stop- 


294 - The Statistics of Measurement 


watches. Moreover, some observers will show a systematic leniency 
in timing, others a systematic tendency to call time a split-second 
too soon. When judgment of quality enters into the process of 
seoring, this component of error is never negligible and is seldom 
entirely random. Observer error is never wholly absent from 
reading a gauge such as a thermometer or pressure gauge, and it 
varies with the type of instrument. Observer error is at a mini- 
mum and perhaps wholly eliminated from scores on an objective 
test of the pencil and paper type. 

Another component of error is due to fluctuation in the in- 
dividual measured or in the environment. Since a person is never 
precisely the same at two different moments in time, temporary 
states of physical well being, emotional tension, preoccupation 
with other matters may affect his performance favorably or un- 
favorably as compared with his typical performance. Attention 
fluctuates continually and a momentary lapse of attention or 
memory may affect his performance. Distractions occur during 
the testing period, a pencil breaks, a fire siren sounds nearby, a 
neighbor sneezes disconcertingly. Conditions preceding the test- 
ing period are never uniform for all participants, and are usually 
unknown to the examiner. Individuals do not completely shed 
their worries or their excitements when they begin a test. In cer- 
tain kinds of tests the amount of recent practice on specific skills 
may affect the score. If the individual is a person, this component 
of error is always large. If it is an inanimate object this component 
is likely to be relatively smaller but not necessarily absent. Ma- 
terials expand and contract and wear out. Materials are not 
completely uniform and samples cut from different parts of a 
piece of cloth or different points on a metal rod do not test alike. 
Machines do not perform with complete consistency at all times. 

Another sort of error comes from inconsistencies in the measur- 
ing instrument. This component is of paramount importance in 
pencil and paper tests for which a limited number of items has 
been selected out of a large pool of items. If different sets are 
randomly selected, they will not be equally difficult for all indi- 
viduals measured, but one set will favor some individuals and 
penalize others. This error is really a chance error produced by 
sampling of items. It is the only error inherent in the test itself 
and the only one affecting that which can properly be called the 
reliability of the test while the reliability of the observations is affected 
by other random errors as well. 


Effect of Measurement Error оп the Mean · 295 


If test items are not selected at random (and they usually are 
not) there may be a consistent or systematic error as well as a 
random error in any particular set. Such consistent errors affect 
the validity of the test. This validity is affected not only by con- 
sistent errors in the test items but by consistent errors of the 
observer and of the individual measured. 

Over and above all the preceding, there are certain chance 
errors such as those related to luck in guessing the right answer 
on partial information, or no information. These may well be 
absorbed into one or the other of the preceding components of error. 

Effect of Measurement Error on the Mean. Random errors 
have a negligible effect upon the mean of a sample, because posi- 
tive and negative errors tend to cancel each other. However, 
because they increase the variability of the sample, as described 
in the next section, they increase the standard error of the mean 
and consequently decrease the reliability of the mean. Random 
errors may even completely obscure a genuine difference between 
two or more means. Therefore, when a sample has shown a non- 
significant difference between means the research worker is always 
obliged to consider the possibility that a real difference exists but 
that his observations are too unreliable to reveal it. For this 
purpose (as well as for other purposes) he needs to study the 
reliability of his observations. 

If all possible measures of each individual were taken, then the 
mean of Ху would be identical with the mean of Ta for the sample 
as well as for the population. Even so the variance of X, would 
be larger than the variance of Ta. However one never has all 
possible measures of individuals, but usually has only one or two 
measures for each. Then the sample mean of X, is not necessarily 
identical with the sample mean of та, though the discrepancy is 
negligible if errors are random. 

If only one or two measures are available for each individual 
and if in these there is a systematic error, one has no assurance 
that the effect of errors on the mean is small and no assurance that 
errors increase the variance. Bias in the observer, an abnormally 
easy or abnormally difficult test form, some aspect of the testing 
situation which served to distract the attention of all individuals 
tested, or the like, might depress or raise all scores on a particular 
trial or might depress or raise the scores of some individuals and 
not of all. Under such circumstances errors are not random and 
their effect may be almost anything. It is not usually possible 


296 · The Statistics of Measurement 


to make a statistical test to discover whether errors are random 
or not, and therefore a careful logical scrutiny of the experimental 
situation is particularly important. 

Let ем and ej, represent the errors in the score of individual 
a on test forms h and j. Since only two forms are postulated, we 
gan say nothing about the expected values obtained from all pos- 
sible test forms as indicated at the bottom of Table 12.1. The 
expected values obtained by averaging Хх over all individuals in 
the population may be considered. If X,,is a measure which differs 
from Ta only because of random error, then the expectation of 
such error is zero and so 


Е, (ем) = 0 
(12.8) and АЕ,(Х.) = EalTa + е) = Е.(т.) + Е.(еһ) = Е.(та) 


Ву the same argument if errors are random because items have 
been assigned to test forms at random the expectation of scores on 
any other test form is also equal to the expectation of Te Con- 
sequently if errors are random, the expectation of scores on any 
one form is equal to the expectation of scores on any other, and 
each is equal to the population mean. 


2 EX.) = Ea(X jo) =` ++ = Е,(Хта) = Hata) = и 
an! 

(2.10) Е.(Хь. — и) = Ba(Xia) -u=p-u=0 

and 


(12.11) E,(r.— и) = Е.(1.) - и= 0 


For any particular sample, random errors may produce slight dis- 
crepaneies among the means of different forms. If errors are not 
random, the test forms may be of unequal difficulty so that the 
expectation of errors is not zero over the population and therefore 
two test forms do not have the same population mean. 

Effect of Measurement Error on the Variance. Random 
errors of measurement consistently inflate the variance so that 
the variance of observed scores is larger than the variance of 

true" scores. Because people who think intuitively rather than 
algebraically often assume that random errors might sometimes 
increase and sometimes decrease the variance, recourse must be 
made to the formula. By squaring Xi, — и = (Ta — ш) + ем and 
taking the expectation of each term for the population of indi- 
viduals, the following formula is obtained: 


(12.12) 04 = 02 4- 0È + 20,,0,0, 


Estimation of True Variance · 297 


If errors of measurement are uncorrelated with “true” scores, the 
last term disappears leaving 

(12.13) с2 = с? + с 

The assumption that o, = 0 appears to be reasonable in very 
many research situations. For the sample, r,, will usually not be 
precisely zero. 

As will be seen later, this inflation of the variance by random 
errors of measurement presents a serious hazard in tests of signif- 
icance, for unless it is possible to obtain fairly reliable observa- 
tions the hypothesis that two or more means are equal can be 
refuted only when differences among the means are very great. 

Indirect Estimate of the “True” Variance c,*. Suppose two 
forms of а measure X are available, which may be called X; and 
Xs, or X, and X; if greater generality is desired. 

Ху = т. Беда and Хь = Та + ба 
The correlation between X; and X; is usually called the reliability 
coefficient of the measure. It may be designated pzz for the popula- 
tion and т for a sample. 

Since X, and X: are comparable measures, it, may be assumed 
that errors are equally variable 


(12.14) dU EA A 
and errors are equally correlated with т 
(12.15) Par = Per = Per 


If errors are random, p, may be assumed to be zero. This is 
usually the case. However, if a test might conceivably be so 
designed or so administered that persons with low standing tended 
to incur consistently greater or consistently smaller measurement 
errors than persons with high standing it would not be zero. If 
(12.14) and (12.15) hold, then 
о? = 05, = Tx 
The correlation р may be found by substituting 

Xia — H = (Ta — Ш) + ĉia for T and Xs — H= та = H+ ёа for y 


: E(ay) ‘ i 

la pu = = takin; the expectation, and re- 
in. the formula pzy E EQ g 
ducing the result by the application of Formulas (12.14) and 
(12.15). By this procedure it may be found that 


д.2 + 2p40,0« + Dose. 
(12.16) ест 


298 - The Statistics of Measurement 


Now, if errors are random p,, = 0 and pae, = 0. In that case 
c. 
(12.17) Pez = оз 
and for the sample, approximately, 
oe 

(12.18) T. £ E 
Formulas (12.17) and (12.18) constitute an important interpreta- 
tion of the reliability coefficient. Jf two measures differ from each 
other only because of random errors of measurement, the correlation 
between the two is the ratio of the variance of true scores to the variance 
of observed scores. 

Solving (12.17) for o,? and (12.18) for s? gives the formulas 
(12.19) ор Сог = or V DA 
(12.20) s © sn. or S, 2 s, V Tus 
which provide a method of indirectly estimating the variability 
of the “true scores.” 

Indirect Estimate of the Error Variance. By Formula (12.13) 
the total observed variance was partitioned into two portions, 


the variance of true scores plus the variance of errors of measure- 


ment, ОО Ue 


As an estimate for o,? has now been found to be c;?p,., an estimate 
for ø? can be found by substitution. Then 


(12.21) Oe = 0.2 — с?р = o; (1— р) 
(12.22) Oe = 0; V1 = рь. 


At this point the student may profitably look again at Formula 
(10.20) and compare it with (12.22) noting that the former involves 
p° and the latter р. The error e may be thought of аз a residual 
error incurred in predicting a true score 7 from an observed score 
X by the regression equation 


Та = Хы With Ta — Ta = Cha 
Then о? is the variance of residual errors corresponding to 0”, 
of Chapter 10. 

The corresponding formulas for the sample are reasonable 
approximations if N is not too small. If М is small, r, may differ 
considerably from zero and N – 2, which is the appropriate 
denominator for s? may be rather different from М — 1, which 


Effect of Measurement Error - 299 


is the denominator for 5,2. With these reservations, however, we 
may state the sample formulas as 


(12.23) 82 e s — ra) 
and 
(12.24) з, = 8; V1 — Taz 


Effect of Measurement Error on a Coefficient of Correlation. 
Let us suppose that X; and X» are two measures of trait X and 
that У, and У, are two measures of trait У, and that 

Xia = Ta + ёа Yi = Va + dha 
(12.25) Хз = Та + Cra You = Va + doa 
where т. and и, (upsilon) are the measures of those traits for in- 
dividual æ without measurement errors and e and d are the respec- 
tive errors. Then pq is the “true” correlation between X and У, 
that is the correlation which would exist if there were no errors 
of measurement, while р», is the correlation between observed, 
fallible measures. 

We have seen that 

и, = и, and X approximates т 

с. > в, and s, tends to be larger than s, 
What is the relation of pzy to pr, and of rz, to ть? 

To answer this question, we shall assume that 

0. = Coy Oh, = Odp E(e:) = E(e) = Е(@) = Е(а:) = 0 
Pre, = Pre Pod, = Pods 

The student should paraphrase each of these statements in words. 
We have already seen in Formula (12.19) that ¢,? = огра 
Similarly o? = Oy Pw 
and E(X — в») (У — ш) = 2.0.0,  pa0.0v + ра баб, + Реабе ба 
But 


(12.26) py = BR = Hed = a) 


0,0, 


PF 1 Fy + pa. 0s + 04-040: + ОгабеСа 
(12.27) ру = ae 


uncorrelated with true scores 80 


If errors are random they are i 
uncorrelated with each other so 


р» = 0 and ра = 0, and they are 
реа = 0. In that case 


_ PwFrFv _ т Ира 
(12.28) ben as Dy V Pza V Вии 


300 - The Statistics of Measurement 


and therefore 

Pry 
12.29 р» = ——— 
( ) PrzPyy 
The sample formula corresponding to (12.27) is obtained by sum- 
ming over N cases instead of taking the expectation. It holds 
precisely: 


728,8 + Т8,8, + 7а88а + Тоа8е8а 
a 


(12.30) Ту 


Even if errors are random, the correlations Te, 7.4 and fe would 
usually be only approximately and not precisely zero for a sample. 
Therefore, the sample formulas analogous to (12.28) and (12.29) 
must be considered as approximations only: 


(12.31) Tey TIS V Tas IT uy 
12.32 Pi 
( ) МТМ Ти 


The preceding formulas indicate that when measurement error 
is random, the correlation of observed scores X and Y is numeri- 
cally smaller than the correlation of “true” scores т and и. This 
lowering or attenuation of the correlation coefficient is inescapable 
since the correlation coefficients т, and ry, for the sample as well 
aS pzz and pyy for the population are less than 1. Formulas (12.29) 
and (12.32), which give an estimate of the correlation between 
scores freed from measurement error, are called the correction for 
attenuation. 

It must be noted that the correction for attenuation was ob- 
tained on the assumption that three of the four terms in the 
numerator of (12.27), or of (12.30), were zero. This assumption 
is almost impossible to verify empirically, but the research worker 
will find it instructive to think about situations in which it is 
unreasonable on a priori grounds. Two illustrations will be given. 
The possible variety of such is almost inexhaustible. 

Suppose that one form of test X and one form of test Y are 
given on the same day, and that a few individuals are suffering 
from some unusual strain which adversely affects their perform- 
ance оп both. Then rea is likely to be not zero, but positive. Con- 
sequently, the right-hand member of (12.30) is likely to be larger 
than the right-hand member of (12.31) and so Formula (12.32) 
will overestimate r,,. 


The Coefficient of Reliability - 301 


Suppose only one form of test X is available and rz. is the 
correlation between two applications of the same test form. Then 
it cannot be assumed that r4, = 0 since the errors an individual 
makes on one application of the test will not be unrelated to 
those made on the other. If fee is positive, rz: is larger than 
52/5? and r, is overestimated by Formula (12.32). 

The correction for attenuation, if applicable under the assump- 
tions, becomes a sort of ceiling indicating the highest value to 
which the correlation between two measures could be pushed by 
improving the reliability of measurement. Suppose two forms of 
an arithmetic reasoning test have shown an intercorrelation of 
Tzs = .66, and two forms of a verbal reasoning test an intercorrela- 
tion of rw = .71. The correlations between а single form of the 
arithmetic and a single form of the verbal test are variously 
Try = .55, 49, .54, and .51. Is it conceivable that the two tests 
are actually measuring the same function and correlations between 
them are less than unity only because we have imperfect measures 
of each? In order to apply Formula (12.32) some average of the 
four values of rz, is needed. Strictly speaking we should average 
the four covariances but no data are given concerning the covari- 
ances or the standard deviations. (The covariance of z and y is 
defined as E(X — uj)(Y — ш) = бутту.) The arithmetic mean 
of the four correlations is 2(.55 + .49 + .54 + .51) 2.5225. The 
geometric mean is [(.55) (.49) (.54) (.51)}' = .522. If each of the 
4 values of r is converted to z, the 2’s averaged and the average of 
the 273 is converted back to r, the result is .52. Obviously the 
method of averaging will have little effect on the outcome. 


Th Ty = шыл б, 
ix "7 CB8) CT) 
In interpretation it may be said in passing that reliability 

coefficients of .66 and .71 make the tests of little value as instru- 


ments for measuring individuals. 

The correlation between the two traits could not be expected 
to rise higher than about .76 even if all errors of measurement could 
be eliminated. The functions measured by the two tests cannot 
be held to be identical. 

The Coefficient of Reliability. If two measures of trait X differ 
only by random errors of measurement, 


(12.33) Dam El và 


302 - The Statistics of Measurement 


With the qualifications stated on page 300 the analogous formula 
for the sample holds approximately, 


(12.34) ее 


These formulas will be considered as a definition of the coefficient of 
reliability, or the reliability coefficient of measure X. 

The question of how to obtain rz; from data will be discussed 
in a later section. It presents certain difficulties about which a 
great deal has been written. Within the scope of the present book 
only the major problems can be considered. 

Effect on the Coefficient of Reliability of Changing the Length 
of the Test. Sometimes when the coefficient of reliability is 
unsatisfactorily low, the test maker wishes to know how much it 
might be increased if the test forms were lengthened by the 
addition of a specified amount of similar material, or how much 
material would need to be added to the tests in order to produce a 
reliability coefficient of a given size. Conversely, he may feel 
that the test requires more testing time than can be justified and 
he may wish to estimate how much the reliability coefficient would 
be decreased if the test were shortened. 

To answer any of these questions one must assume a homo- 
geneous body of material from which test items are chosen at 
random so that errors due to selection of test material are random 
errors. One must also assume that errors due to variation within 
the individual tested, or to variation in the administration of the 
test, are random errors. Then it can be assumed that the covari- 
ance for any two forms is the same as for any other two forms, or 


Е(Х, — и)(Х; — и) = рыт? 

for any k andj. Assumptions which are intuitively clearer though 
somewhat more restrictive are 

(1) all forms are equally variable, so o?» = с? =... = 0,2 and 
and (2) all correlations between forms are the same, so 

ры ры 

Now let 2p:. be the correlation between (X; + Х,) and (X; + Ха) 
and 
3Pzz be the correlation between (X; + X; + Xs) and (X4 + X; + Xs) 
and рг be the correlation between (Xi +- - -+ Х,) 


and (Xia: + Xo) 


Effect on r, of changing r,, and ги - 303 


2pre 
Lp 

res 
lf 


Dzz 

1+ (р = 1)pzz 
Prez 

1+ (p-lrz 


Formula (12.38) is commonly called the Spearman-Brown Prophecy 
Formula. The names of Charles Spearman and William Brown 
are both attached to it because papers by these men appeared 
simultaneously in the same issue of the same journal * °, 


(12.35) Then 2Pzx = 


(12.36) and approximately — r4 = 
(12.37) also pPzz = 


(12.38) and approximately „== 


Illustrations. 

A. A test with reliability coefficient .52 requires 15 minutes to ad- 
minister, If it could be increased in length to require 60 minutes, how 
much increase might be hoped for in the reliability coefficient? Here 
p=%2=4. Then 

LACE) _ 2.08 _ 
= = 352) 256779 
Actual experience in lengthening the test and giving it to a new group is 
of course necessary, because the added material may not be entirely 
comparable to the original items, and individuals may work either more 
or less effectively in the longer period. | 

В. If a reliability coefficient of .90 were required for the purpose in 
hand, how long would the test need to be? Now р is the unknown to be 
found from the equation 

TPE 
90 = гр 1082) 
Solving this equation gives p = 8.3 so the time presumably required would 
be (8.3)(15) minutes or 125 minutes. If that appears to be more testing 
time than can be suitably allowed, the test maker may consider whether 
he can tolerate a lower reliability, or whether by changing the type of item 
he can secure greater reliability in less testing time. He will have dis- 
covered without laborious writing of new items that sheer lengthening of 
the test by addition of similar material is not likely to achieve the desired 
result. 

Effect on the Correlation between Two Measures of Changing 
their Reliabilities. A. When the data are given in terms of reliability 
coefficients. (1) Suppose rz, = -55 when Ter = 65 and fy = .75. 


304 · The Statistics of Measurement 


If each reliability coefficient could be made .85, what value of Tey 
could be anticipated? Let 7',, designate the value of Tzy to be 
anticipated when the two reliability coefficients are 1 ee = .85, and 
паев: 


Ву Formula (12.32) т» = VUSGS 
т” 

| и. 

and also T. C85) C88) 
K 55 
Therefore ЖЕЕ 
р 78585 ^ V/CO5) C75) 
.55(.85) 

d у & s.m 
Ber T Я GB) (75) 
Generalizing this procedure, we may write 

5 Tay V T asl yy 
(12.39) rapes UT 
‘el yy 


(2) Suppose rz, = .55 when т», = .65 and т, = .75. If measures 
of X could be made completely reliable without any change in 
the reliability of У, what value of 7';, might be anticipated? The 
question implies that г’, = 1.00 and r’y, = ry, and that r’., is really 
to be r,, because measures of X are to be free of error. When these 
values are substituted, Formula (12.39) becomes 


(12.40) MES v 


a 


and similarly 


.55 
Then Try > 8 68 
B. When the data are given in terms of the proportionate change in 
length of tests. Suppose it is desired to find the correlation between 
(Xi --- X,) and (Yı +--+ У.) where it is assumed that Xi, 
...X, are p comparable measures of X and Y;...Y, are 9 
comparable measures of Y. Let this correlation be designated 
paPey We speak here as though p and 4 were integers but the 
resultant formulas apply even when p and q are fractions. Let | 
either one of the following sets of assumptions be made: | 
(1) The covariance of any two measures of X is p.c? 
The covariance of any two measures of Y is рту? 


Effect of Measurement Error - 305 


The covariance of any measure of X and any measure of Y 
18 PryF2Fy 

(2) All measures of X have the same variance, сг? 
All measures of Y have the same variance a,” 
The correlation between any two measures of X is pzz 
The correlation between any two measures of Y is pyy 
The correlation between any measure of X and any 
measure of Y is pzy. 

On the basis of these assumptions 


PUPzy 
Vp + p(p — 1)р Va + 4(9 — Юры 


(1241) pay = 


and approximately 


pafz S PU xy 

Vp + p(p — Dra Va +49 — Drw 
Suppose rzy has been found to be .35 when т. = .58 and ту, = .65. 
It is decided that the test for X can be made twice as long and 
that for Y three times as long as at present. What correlation 
might be anticipated between X and Y? Here p=2 and q = 3. 
Then 


(12.42) 


n 6(.35) = 2.10 А. 2.10 _ 45 
w^ 7a +4 2058) V3 +6065) V(8.16)(6.90) 467 ` 
Tt is interesting to note that Formulas (12.32), (12.36), (12.38), 
and (12.40) are special cases of (12.42) and can be derived from 
it by appropriate substitutions for p and q. 

Effect of Measurement Error on a Regression Coefficient. 
Random errors of measurement have been seen to reduce the 
correlation coefficient and to increase the standard deviation. 
What effect do these errors have upon the regression coefficients 


_ E(v- ш)(т = ш) e Su 
Bur = sq See and by, = Tr = 


How do these coefficients to predict v from т compare with the 


corresponding coefficients to predict y from x? 
The question is easily answered by comparing the formulas 


E(v = №)(т — и.) _ EY = my) (X = ш) 
диме Ei iam pU 


Y, which is the dependent or predicted variable, can 


Errors in 
Random errors 


obviously have no effect on the denominator. 


306 - The Statistics of Measurement 


in X, which is the independent variable, have already been seen 
to make E(X – u:)? = o larger than E(r— и.) са. see 
Formula (12.13). 

Random errors in either Х ог У have no effect upon the numer- 
ator. If errors are uncorrelated with each other and with true 
scores, then 
E(X — иг) (У — ш) = Е(т — и + e)(v = m + d) 

= Е(т — u)(v — m) + Е(т — и) а 
+ E(v — ше + E(ed) 
= E(r — u)(v— №) - 040-40 
Similar statements hold approximately for the sample coefficients 
b,, and б. 

Therefore it may be stated that random errors in the predicted 
or dependent variable have no effect upon a regression coefficient 
and that random errors in the independent variable reduce the 
coefficient. 

Effect of Measurement Error on Tests of Significance. In 
general, random errors of measurement make the null hypothesis 
appear more tenable than it would be if true measures could have 
been used. 

In an earlier section of this chapter it was shown that random 
errors of measurement have no effect on the mean of a population, 
so that и, = u,. Their chief effect on the mean of a sample is 
to increase its sampling variability. As c, is increased by meas- 
urement error to ox? = c? + c^, than oz? is increased from 

TA бз око 
N Дзен эү 

When several means are to be compared by use of an F ratio, 
random errors of measurement may raise or lower any of the means 
slightly, but the effect on the numerator of the F ratio is small 
and may be such as to make the means either more alike or less 
alike. However, the denominator sum of squares is consistently 
increased by random errors and so the F ratio is reduced by them. 
The £ ratio of the difference between two means to its standard 
error is similarly reduced by random errors. When an investigator 
has accepted the hypothesis that two or more means are equal in 
the population he must always keep in mind the possibility that 
a very real difference has been obscured by the measurement error 
present in his data. This is one of the reasons why some evidence 
concerning the reliability of the data should be sought. 


to 


Estimating a Reliability Coefficient from Data - 307 


The F test of significance for r is affected in the same direction 
by unreliability of the scores. As т із attenuated by error, 


is also reduced in size. In general, the greater the measurement 
error, the less likely is it that any statistical test will prove sig- 
nificant. 

Method of Estimating a Reliability Coefficient from Data. 
Although the reliability coefficient has been defined by the for- 
mulas 


2 
=—=1-— and т. = =l- 
Pas = т p == 3 s 


and some aspects of its meaning have been discussed, there has as 
yet been no discussion of how to obtain it from data. The reader 
has probably noticed this omission and attributed it to oversight. 
The omission has been deliberate. Because every possible method 
of estimating a reliability coefficient from data has certain draw- 
backs, it has seemed better not to identify the concept with a 
particular experimental procedure at too early a stage in the 
reader's thinking. 

There is no direct way to measure either 7 or e but only 
Х =т+еє. Consequently there is no direct way to compute 
s2 or 82. If two measures of т can be obtained such that they 
differ only by random errors of measurement, X; = т+ е and 
X = т + е; then the correlation and variance of observed measures 
can be used to estimate 8,2 as 72,82". Тһе difficulty is to set up an 
experimental situation in which X; and X» actually conform to 
this assumption. There are four principal experimental procedures 
commonly used, each of which has some advantages and involves 
some serious drawbacks. These are: (A) correlating scores from 
two comparable but different tests given on two different occa- 
sions; (B) correlating scores on а single test given twice with a 
time interval between the repetitions; (C) correlating scores on 
two tests given on the same occasion, or scores on two halves of 
a single application of а single test; and (D) analyzing the variance 
among items on а single application of a single test. In con- 
sidering the appropriateness of each of these four procedures, 
the essential requirements are (1) that 7 shall be a measure of the 
trait it is desired to measure and shall not change from one form 
to another, or from one occasion to another, and (2) that е and es 


308 - The Statistics of Measurement 


shall be random errors due to variation of the kind it is desired 
to appraise. It is therefore especially important to make an 
а priori analysis of the four procedures. We shall now discuss 
the statistical aspects of these four procedures. For practical 
advice as to how to construct tests which are likely to conform to 
the assumptions made, the reader should consult a treatise on 
test construction ? ® 1, 

A. Correlating scores on two different test forms given on two 
different occasions. Suppose a large pool of test items is available, 
each of which has been constructed to conform to specifications 
drawn up for the test. Then let items be drawn at random from 
this pool and assigned at random to the two test forms. Under 
such a plan, the two forms should measure the same 7 and differ- 
ences between them should be due to random errors in the sampling 
of items. If items are not assigned at random to the two test 
forms, these may be either more alike or less alike than under 
random assignment of items. If they are more alike, s; is under- 
estimated and т.. is overestimated. If they are less alike, s,* is 
overestimated and r+» is underestimated. In many situations, 
some of which are discussed by Thorndike and by Cronbach in 
the references cited, it is impossible to secure a large pool of items 
or to construct comparable test forms. If the two test forms are 
not of equal diffieulty, or if the items are of two different types 
(say one test is a completion test and one a multiple-choice test 
on the same subject matter), or if the subject matter differs from 
test to test, then the two forms are not measuring the same т. 

The purpose of allowing a time interval to elapse between the 
administration of the two tests is to permit temporary states of 
attention, fatigue, well-being and the like in the subjects, and 
accidental advantages or disadvantages in the environment to 
register differences in the scores. Therefore, the time interval 
should not be great enough for the real abilities of the subjects to 
undergo a change, for in that case the two scores would not be 
representing the same т. If the second test is given immediately 
after the first, temporary states of well-being of the subjects will 
have similar effect on both scores, s will be underestimated and 
Tre overestimated, 

If two comparable test forms are available and if it is admin- 
istratively feasible to allow time for testing on two different days, 
this procedure appears to give the most satisfactory estimate of 
T; The error variance then represents variation due both to 


Estimating a Reliability Coefficient from Data · 309 


selection of test items and to fluctuation in performance of sub- 
jects. 

B. Correlating scores on repetitions of the same test form. In 
this procedure the task is precisely the same on the two administra- 
tions. Therefore, if the sampling of test items favors a particular 
subject unduly on one occasion they do so on both, and departures 
from his true score because of sampling of test items does not show 
up in the error variance but works to increase rz, unduly. In this 
procedure the error variance is due to differences in the per- 
formance of the subjects on two occasions, so rz is measuring the 
stability of the subjects’ performance rather than the reliability of 
the test. If in the interval between repetitions, some of the sub- 
jects have had more practice than others in the function tested or 
have had some experience which changed them in respect to that 
function, the corresponding changes in score will appear as errors 
when actually they represent changes in т. Then s will be 
overestimated and rz, underestimated. 

Sometimes the only error in which the investigator is interested 
is performance error. For example, consider the archery data 
presented in Chapters 9 and 14. The essential task is uniform, 
shooting at a uniform target under conditions as nearly standard 
as they can be made. The stability of performance of the subjects 
is properly the aspect of reliability to be studied. In measures of 
posture, of metabolism, of reaction time, or in general of physical 
states or simple skills, the essential kind of error is performance 
error in the subject and perhaps also in the person administering 
the test. No sampling of test items is called for. 1 

In measures of knowledge, achievement, temperament, atti- 
tude, opinion, and the like, repetition of the same test form usu- 
ally greatly underestimates s? and overestimates r, Errors in 


measuring an individual because the test items are a sample out 


of all possible items should contribute to s^ but cannot do 80 when 
the same test form is repeated. Furthermore, there is the risk that 
some memory of answers given on the first test may affect answers 
on the second, thus further reducing s and fallaciously increasing 
Trz. 

C. Correlating scores on two test forms given on the same oc- 
casion. (a) When two distinct test forms are available. The first 
paragraph of the discussion under A applies here also. The error 
variance includes variance due to the sampling of test items. 
Variance due to instability of performance of individual subjects 


310 · The Statistics of Measurement 


does not appear as error but as part of the measure of т in both 
tests thus raising the estimate of т. (b) When only one test form 
is available and two scores are obtained from it by subdivision of the 
items. This procedure is very widely used. Perhaps there is no 
second form of the test. Perhaps time is not available for admin- 
istering two complete forms. Perhaps a test maker is developing 
an instrument and needs a preliminary estimate of its reliability 
before he carries out the work of standardizing two comparable 
forms. The single test form is administered, its items are in some 
manner divided so as to form two half-tests, and the scores on 
these half-tests are correlated. Then the Spearman-Brown 
Prophecy Formula, Formula (12.36) is applied, 


Wrox 
1+7 


Pen = 


with the correlation between the half-tests used as тш. The 
resulting value of yrs: is then assumed to be the correlation which 
would have been obtained between scores on two full test forms 
had such been available. Certain comments on the method should 
be made. 

(1) The error variance takes account of the sampling of test 
items but not of the variation in subjects’ performance from time 
to time. 

(2) Sometimes it is impossible to subdivide a test into com- 
parable halves. Such is the case when success on one item is 
conditioned by success on previous items, or when speed plays an 
important role. If a test is to be timed, the half-tests should be 
constructed in advance, and each half-test given independently 
with its own time limits. Suppose a test consists of four blocks of 
questions. For example, suppose a reading test consists of 4 long 
passages with several questions on each. If half the passages and 
all the questions on each of them are placed in one form, the 
correlation between the two forms may be low. The application 
of the Prophecy Formula will in that case suggest the increase in 
reliability to be expected if the test is increased by the addition 
of new passages. If half the questions on every passage are placed 
in one form, the correlation may be quite high. The application 
of the Prophecy Formula can in this case be used only to estimate 
the correlation between two test forms obtained by the addition 
of new questions based on the same set of passages. As it is usually 
not possible to find a large number of new and equally good 


Estimating a Reliability Coefficient from Data · 311 


questions, and as one is more likely to want to generalize to a 
universe of passages than to a universe of questions about a fixed 
set of passages, this second method of subdividing items has little 
to recommend it. 

(3) The number of ways in which the items of a test can be 
subdivided to produce two half-tests is very large, and each of them 
produces a different correlation between the test halves. If, for 
example, a test contains 20 items, the number of ways in which 
the items can be assigned to the two half-tests is 92378. 

(4) The Spearman-Brown Prophecy Formula is derived from 
rather restrictive assumptions (stated earlier in this chapter) to 
which the practical situation may not conform. 

D. Measuring the internal consistency of a test. The preceding 
section referred to the great variety of ways in which a test can be 
subdivided to form two half-tests and the consequent indeter- 
minacy of the estimate of the reliability coefficient obtained by 
correlating scores on the halves. This indeterminacy has caused 
research workers to look for a method which is independent of 
any particular split up of items. A formula which serves this pur- 
pose was developed by Kuder and Richardson.’ The symbols used 
here are slightly different from those in the original presentation. 

N = number of individuals taking the test. 

m = number of items in the test. 

р: = proportion of individuals answering the ith item cor- 
rectly. 

Мр; = number of individuals answering the ith item correctly. 

qi = 1 — pi = proportion of individuals not answering the ith 
item correctly. 

Nq: = number of individuals not answering the ith item cor- 

rectly. 

82 = variance of the test scores of the N individuals if each 
score is the number of items answered correctly by the 
individual. 


H N Y pai 
(1243) Then т = = 1-2 Ise 
N(Zpi - Хр?) 
(12.44) - zal -N = 082 ) 
m E(Np)(Nq) 
(12.45) = =" (1 -NN - De 


312 - The Statistics of Measurement 


Formulas (12.43), (12.44), and (12.45) are equivalent. They 
provide a measure of the internal consistency of the items in the 
test. If data come to the computer in terms of the proportion of 
persons answering an item correctly, Formula (12.44) provides a 
convenient routine for machine computation since Хр and Zp? can 
be obtained simultaneously. If the nwmber of persons answering 
each item correctly is available, Formula (12.45) should be used 
because it involves less computational labor and less error from 
rounding. 

These formulas provide a kind of average of the various reli- 
ability coefficients which could be obtained from all the possible 
ways of subdividing the m test items, but that fact is not imme- 
diately obvious and requires more algebra for its development 
than it seems appropriate to introduce here. 

The variance of scores, s,”, can be written as the sum of all item 
covariances and variances if each correct item is scored 1 and each 
incorrect item 0. In such scoring the variance of an item is 
Npq/(N — 1). Hence ZNp;q;/(N — 1) is the sum of item variances 
in the sample. In formula (12.43) the quantity NZp;q;/(N — 1)s? 
is the sum of item variances divided by the sum of item variances 
and covariances, each covariance being based on a pair of items. 

If the sum of all the covariances were zero, the numerator and 
denominator would be equal, the quantity in parentheses would 
be zero and r would be zero. However this situation is very 
unlikely for it is almost inconceivable that a test maker would 
put together a set of wholly unrelated items. It is still less likely 
that he would formulate a test in which many of the inter-item 
correlations would be negative. 

If the items are all measures of the same variable they should 
show positive correlations. If these inter-item correlations are 
high and positive, the inter-item covariances are large and positive, 
and the denominator s+? which is the sum of variances and covari- 
ances is much larger than the numerator NZp;g;/(N — 1) which is 
the sum of variances only. Then the fraction PEE is small 
(though it cannot be zero) and т is near to 1. In this case the 
items are highly consistent with each other. 

Kuder and Richardson? have given several variations of For- 
mula (12.43) under different assumptions. Other proofs have been 
given by Jackson * and by Hoyt.‘ 

As an application, consider a test of 12 items given to 9 subjects, 


Estimating a Reliability Coefficient from Data - 313 


each item being scored 1 if answered correctly and 0 if answered 
incorrectly as in Table 12.2. A genuine problem would be likely 
to involve a far greater number of subjects, but the pattern of 
procedure is the same, and the reader will learn as much from a 
small computation as from a long one. 


TABLE 12.2. Record of 9 Subjects on 12 Items 


Record on Item i Made by Subject a NENT 


2: 9-45 ay at Pi 


556 
556 
778 
889 
778 
444 
778 
667 
718 
667 
667 
444 
8.002 


мены ы 
аон C «со со -1 C л i C IN н g 


с=с сок eR eR rR ORR оо о 
WOH RE HORM HEHE HOO 
WISeoSCoOOCOCORKRHOS 
DOCH RH eR RR ROR 
lone ыыыыы ыы 
Фон нны он онын © 
[1 ноонооооннн 
Ol ноннноеонеонн| Ф 
Ма оомо чњ оол 
Slo оз оо кю оо to сл o m to i 


Summing the columns of Table 12.2 produces the scores of the 
subjects, Xa. For these 9 scores, 


Z(X- X}? = (М- 1), = 60. 


Summing the rows produces the item scores Np; Subtracting 
each Np; from М = 9 produces №. As a check we note that 


12 9 
Ўр: = 72 = УХ, 
1 1 


that УМ: = 36, 
and that ENp; + ХМа: = 108 = 9 x 12. 


If the computation is completed by Formula (12.45) we first 
find 2(Np,)(Nqi) = 20+20+14+8+4---+ 18 + 20 = 198 and 
PNE M UI С) ) = 69 
te 102—980) =" 


Tf the computation is completed by Formula (12.44), we obtain 
the column p; by dividing the Np; column by N = 9. It is neces- 
sary to carry these values out to several decimal places to reduce 


314 · The Statistics of Measurement 


the effect of rounding error. Then Хр; = 8.002 is checked against 
the expected value 4? = 8. Zp, = 5.559. 


 12/(, _ 98.002 — 220) ч 
= т (1 60 = .69 


REFERENCES 


ik 
2. 
3. 


10. 


п. 


Brown, William, “Some Experimental Results in the Correlation of Mental 
Abilities,” British Journal of Psychology, 3 (1910), 296-322. 

Cronbach, Lee J., Essentials of Psychological Testing, New York, Harper 
Brothers, 1949. 

Gulliksen, Harold, Theory of Mental Tests, New York, John Wiley and Sons, 
1950. 


. Hoyt, Cyril, “Test Reliability Obtained by Analysis of Variance,” Psycho- 


metrika, 6 (1941), 153-160. 


. Jackson, Robert W. B., Application of the Analysis of Variance and Covariance 


Method to Educational Problems, Bulletin No. 11, Department of Educational 
Research, University of Toronto, 1940. 


. Jackson, Robert W. B. and Ferguson, George A., Studies on the Reliability of 


Tests, Bulletin No. 12 of the Department of Educational Research, University 
of Toronto, 1941. 


. Kuder, G. F. and Richardson, M. W., “The Theory of the Estimation of Test 


Reliability," Psychometrika, 2 (1937), 151-160. 


. Lindquist, E. F., Educational Measurement, Washington, D.C., American 


Council on Education, 1951. 


. Spearman, Charles, “Correlation Calculated from Faulty Data,” British 


Journal of Psychology, 3 (1910), 271-295. 

Stouffer, Samuel A., et al., Measurement and Prediction (Studies in Social 
Psychology in World War II, Vol. 4), Princeton, N.J., Princeton University 
Press, 1950. 

Thorndike, Robert S., Personnel Selection; Test and Measurement Techniques, 
New York, John Wiley and Sons, 1949. 


13 Multiple Regression and 


Correlation 


Methods of using scores on one variable to predict 
scores on another related variable were discussed in Chapter 10, 
and tests of significance for the correlation coefficient and the 
regression coefficient were given there. In this chapter scores 
on two or more variables will be combined to predict scores on 
another variable called the criterion and the following questions 
will be considered: What weight should be assigned to each of the 
predictor variables in order to obtain the best estimate of the 
criterion variable? How good is that “best estimate’? What 
computational routines are economical and efficient? How can 
significance tests be made for the coefficients in the prediction 
equation and for the coefficient of multiple correlation? 
In this chapter it will be assumed that all correlation coeffi- 
cients are product moment coefficients as defined in Chapter 10. 
Prediction of Semester Grade in a First Course in Statistics. 
In order to estimate in advance the amount of difficulty students 
ave in an introductory course in statistical methods, 
dministered at the first session. One 
of these was a 45-minute test in reading difficult material. One 
was a specially constructed test of simple arithmetic and algebraic 
relationships of the sort most often encountered in statistical 
problems. One was an artificial language test involving unfamiliar 
symbols in logical systems of operation. Three class sections 
were available. Sections I and III met at the same evening period 
for a total of 3 hours a week. Students with good mathematical 
n the prognostic tests were assigned 


preparation and high scores о t 
to Section I while students who anticipated difficulty because of 


poor preparation and who had low scores on the tests were assigned 
to Section III, which was considerably smaller than Section т. 
Section II met in the morning for a total of 4 hours a week, and 
included students with widely varying background. The three 
е taught by three different teachers. When these 


were likely to h 
three prognostic tests were а 


sections wer 


316 Multiple Regression and Correlation 


TABLE 13.1 Scores Made by Students in a First Course in Statistical Methods 
Subject Matter of the Course, and the Semester Grade (*indicates a women 


Prognostic Test Score Criterion Score 
Code ,Artifi- ^ Arith- Mid- 


Final Semester 


number Section Reading cial metic term : 
of student language test test охаш grado 
1 III 33 51 33 58 65 62 
2 III 39 50 36 53 51 52 
3 I 38 38 39 52 53 53 
4 I 23 27 29 47 42 45 
*5 I 44 40 41 61 50 56 
*6 I 45 54 38 47 53 50 
i I 48 57 42 64 64 64 
8 II 38 39 35 64 54 59 
9 II 32 46 23 58 50 54 
*10 II 28 45 37 44 45 45 
11 II 33 53 28 62 63 63 
12 II 40 43 25 52 50 51 
13 TE 34 55 38 50 56 58 
14 II 34 57 36 71 68 70 
15 II 35 46 37 62 65 64 
16 I 32 43 39 52 59 56 
17 II 34 44 38 59 57 58 
18 II 42 51 35 53 60 57 
19 I 32 55 36 62 56 59 
*20 ш 24 34 17 37 43 40 
21 ii 39 52 34 55 53 54 
22 i 34 25 29 30 42 36 
23 II 44 58 35 61 56 59 
24 III 27 29 26 49 40 45 
25 II 34 52 30 43 38 4l 
26 II 40 53 29 61 54 58 
27 37 34 26 47 57 52 
28 ш 22 40 33 47 50 49 
29 I 30 54 40 56 47 52 
*30 I 36 58 25 52 39 45 
31 I 35 43 40 49 37 43 
32 I 39 58 32 49 37 43 
33 II 37 28 14 43 36 40 
34 I 38 51 26 40 42 41 
35 I 36 55 31 59 52 56 
36 I 46 52 39 61 63 62 
37 п 44 57 32 46 53 50 
38 40 56 32 64 49 57 
39 ш 38 30 18 25 38 32 
40 I 50 47 32 58 52 55 
41 II 16 49 26 52 36 43 
42 1 46 60 43 61 57 59 
43 I 42 56 40 59 54 57 
*44 ш 27 53 26 62 48 55 
45 ш 33 55 25 55 57 56 
46 п 40 54 34 52 57 55 
47 I 46 53 37 62 49 56 
+48 I 46 58 39 58 56 57 


Prediction of Semester Grades : 317 


on Three Prognostic Tests given at Beginning of Term, Two Examinations in 
student.) 


Prognostic Test Score Criterion Score 
Code Artifi.  Arith- Mid- ү, 
number Section Reading cial metic term ee Dx 
of student language test test gr 

*49 25 57 24 46 49 48 
*50 ш 26 45 18 47 25 36 
*51 I 42 46 31 58 56 57 
*52 I 34 60 33 62 64 63 
53 ш 28 37 8 47 46 47 
54 II 21 24 16 44 41 43 
55 I 44 43 35 58 55 57 
56 I 38 47 28 58 56 57 
57 III 16 50 27 47 47 47 
58 ш 40 48 35 56 38 47 
59 ш 33 42 25 52 46 49 
*60 II 35 59 25 52 49 51 
61 III 33 40 28 49 47 48 
62 III 30 41 30 56 59 58 
63 I 38 47 39 62 63 63 
*64 ш 36 43 26 61 50 56 
*65 I 45 57 35 53 46 50 
66 III 32 32 21 50 41 46 
67 III 36 53 27 49 34 43 
*68 III 21 46 39 44 33 39 
69 II 30 54 34 47 46 47 
70 п 34 49 27 50 44 48 
71 II Al 58 39 53 57 55 
72 I 44 55 38 47 46 47 
*73 II 36 14 20 30 40 35 
74 I 58 50 38 68 55 62 
75 II 36 41 30 62 56 59 
76 ш 31 14 18 27 30 29 
77 II 40 46 36 58 50 54 
78 I 44 60 43 71 45 58 
*79 ш 36 36 23 49 43 46 
80 I 44 51 36 53 47 50 
*81 1: 34 46 37 43 49 46 
*82 I 38 56 36 64 61 63 
*83 I 38 52 33 50 56 58 
84 I 38 51 33 50 48 49 
85 I 46 51 40 62 54 р 
86 ш 30 42 22 37 33 б 
*87 ш 26 44 28 58 48 E 

*88 I 36 58 42 59 69 
89 I 36 43 42 46 44 s 
*90 III 32 59 36 61 55 5 
*91 III 30 23 28 38 38 i| 
*92 I 31 58 31 44 51 E 

93 п 44 34 32 55 49 
6 18 40 33 37 

*94 III 14 1 

44 26 30 37 19 28 
95 III H 59 
*96 I 30 58 37 61 5 УЙ 

97 I 39 34 33 41 46 
98 I 32 56 30 65 51 58 


318 - Multiple Regression and Correlation 


tests were first given, only more or less intuitive speculation 
could be used to indicate how scores might be interpreted to 
predict success in the course as no criterion was then available. 
At the end of the semester, scores on a midterm test and on a final 
test and the average of these two tests recorded as the semester 
mark provided criteria against which the effectiveness of the 
prognostic tests could be studied. A formula thus developed to 
estimate one or the other of these criteria from the three place- 
ment tests could then be used in giving advice to similar classes 
in subsequent terms. 

The data for all students who had scores on every test are 
shown in Table 13.1. An asterisk attached to a code number 
indicates a woman student. The figures in this table can be 
analysed in a variety of ways. In this chapter we shall consider 
problems related to the prediction of a criterion score from the 
three prognostic test scores. 

Simplified Problem with Two Predictors. In order to help 
the student gain clear concepts of the procedure and the meaning 
of estimating one variable from several others and the residual 
errors involved in that process, a very small problem will now be 
worked out with only two predictors and a small number of cases 
selected at random from the data of Table 13.1. For this problem 
we shall predict score on the midterm test (У) from score on the 
reading test (Ху) and score on the artificial language test (X). 

Multiple Regression Equation with Two Predictors. To 
combine the two predicting variables (also called the independent 
variables) a multiple regression equation is used. For two pre- 
dictors, the general form of this equation is 


(13.1) 7. = А +. аа 


The notation of this equation is to be interpreted as follows: 

Xi, and X», are observed scores for the ath individual. 

A is a number to be computed from the sample data. 

Би 2 is a number to be computed from the sample data indicat- 
ing the weight given to X, in the regression equation. 
It is called a partial regression coefficient. 

Бл is a number to be computed from the sample data. It 
is the partial regression coefficient which expresses the 
weight given to X» in the regression equation. 

Y, is the value of Y estimated from Хы and X», by the regres- 
sion equation 


Multiple Regression Equation with Two Predictors - 319 


The formulas for Би, бул, and A will now be given without 
justification and the rationale will be developed in a later 


section. 
Тоу T PER eee _ Те Tune Sy 
: 95 Sra з and Pas l-rh $ 
(13.3) A Уа 


In order to display vividly the relations involved, 10 cases were 
chosen at random from the 98 cases of Table 13.1 and computations 
carried through for these 10 cases. 

'The mean and standard deviation of each trait are shown in 
the lower part of Table 13.2, also the three correlation coefficients. 
When these values are substituted in Formula (13.2) the values 
of the two regression coefficients are found to be 


bs = — .006 bya = .184 


Substitution of the means and bj.» and by. in Formula (13.3) 
gives 
А = 51.2 + (.006) (34.1) — (.134) (49.2) = 44.9 


The regression equation is therefore У = 44.9 – .006X; + .134X» 
This equation should now be applied in turn to each of the ten 
individuals listed in Table 13.2, estimating his midterm test score 
Y on the basis of his two observed prognostic scores. These 
predictions are listed in the column headed Y and the student 
should verify some of them. Errors of estimate are listed in the 
column Y — Ў and should be verified by the student. 

When the variables are expressed in standard score form, we 
shall attach a star to the symbol for the regression coefficient. 
In this case the constant A disappears from the equation leaving it 


УИ b* Xie Xi Xoa— Xe 
= би B 


ра 
Sy 8 


(13.4) 


Ти — The 
1- rs 


(13.5) where bën.: = 


Ty — Тут 
1-7 
For the data of Table 13.2 therefore b*yı.2 = — .0084 and 5*5, = .165. 
Thus b*’s like the r’s are pure numbers which do not involve 
the scale of measurement of the traits while the b’s are affected 
by the scale of measurement. For all work in which a regression 


Иа v 


320 - Multiple Regression and Correlation 


equation is used to predict scores the b’s must be used unless all 
variables are in standard score form. For almost all other prob- 
lems, including tests of significance, it is more convenient to use 
the b*’s. In later sections of this chapter several such problems 


TABLE 13.2 Scores for Ten Cases Selected at Random 
from Table 13.1 and Certain Statistics Derived from Those Scores 


X Y Y Я 
Xi Arti- Mid- 5 үү у 
Case Reading ficial term Predicted Residual 00) 
language test yalue 
guag S 
82 39 58 49 52.34 — 3.34 2564.66 
16 32 43 52 50.39 1.61 2620.28 
66 32 32 50 48.98 1.07 2446.50 
46 40 54 52 51.80 20 2698.60 
22 32 55 62 51.98 10.02 3222.76 
41 16 49 52 51.29 71 2667.08 
28 22 40 47 50.06 — 3.06 2352.82 
47 46 53 62 51.63 10.37 3201.06 
34 38 51 40 51.41 — 11.41 2056.40 
37 44 57 46 52.17 — 6.13 2399.82 
Sum 341 492 512 512.00 0 26224.98 
Sum of 12429 24838 26626 26224.95 400.99 АЕБ 
squares 
2 
Suny 11628.1 24206.4 262144 262144 0 А 
Sum of 800.9 631.6 411.6 10.55 400.99 — 
squares 
of devi- 
ations 


УХХ, = 17130 ты = .496 X, = 341 s? = 88.99 sı = 9.48 
ZX;Y = 17501 Ty = .073 Х, = 49.2 82 = 70.18 з = 8.38 
ZXY = 25272 Ta, = .160 Y -512 82 = 45.73 зу = 6.76 


iu i imi) „у _' Fa Зу _ 

ЛТ ҮШ .00843 bs = — .008 a = — .00604 
M le ey _ үң By _ 

ъл sd. 165 Бл = 164 = .134 


C o ЕЕ UR stipe DL LR Ep 


will be considered. The symbol is in fairly general use for the 
value here denoted as b*, having been used before there was any 
general recognition of the desirability of using different symbols 
for population parameters and sample statistics. In this text В* 
represents the population value corresponding to the sample value 
b* and В the population value corresponding to b. 


Multiple Correlation - 321 


Effectiveness of Prediction by a Multiple Regression Equation. 
A multiple regression equation may involve a large number of 
terms and a great deal of computation and still be relatively 
useless. Two different questions have probably occurred to the 
reader before now: (1) How effective was this equation in estimat- 
ing midterm scores for these ten students? (2) If applied to 
another random sample out of the 98 students or if applied to 
another class another term, how effective would the predictions be? 

Unless the equation is effective with the group on which it was 
obtained it cannot be expected to be useful with another group. 
However it may give very close estimations for the group on which 
it was obtained and be much less effective with a new group. The 
two issues must be considered separately. 

Intuitively one might say that a regression equation gives a 
satisfactory estimation for the group on which it was obtained if 
there is a high correlation between observed and estimated scores, 
or if most of the variation of observed scores can be ascribed to 
regression and very little to errors of estimate. These two inter- 
pretations will now be considered in relation to the data of Table 
13.2 and will be seen to be really the same interpretation. 

Multiple Correlation. The coefficient of correlation between 
observed scores on some trait and scores predicted for that trait 
by a multiple regression equation is called a coefficient of multiple 
correlation or a multiple correlation coefficient. For the data of Table 
13.2 this would be the correlation between scores in the columns 
Y and Y. From the final column in that table Z YT = 26224.98, 
Since the mean of Ў is the same as Y, 


Dyy = 26224.98 — (512)*/10 = 10.58 


and 
zYY -(ZY)yN 10.58 


no V 7 EN Gr. 3 Gm = 70116)01055) ° 


The symbol for this multiple correlation will be written Ey». The 
single primary subscript standing to the left of the dot names the 
variable whose observed and estimated scores are being correlated. 
The two secondary subscripts to the right of the dot name the 
variables used as predictors in the regression equation. Other 
symbols sometimes used with the same meaning as E,» are 


Ryan, Тулу Ти, Тода» and Жол. 


322 - Multiple Regression and Correlation 


The direct computation described above was employed only 

for the purpose of making vivid to the student the concept of 
what a multiple r means. Several formulas algebraically equiv- 
alent to that process are available for obtaining Ry... more econom- 
ically. Of these one of the most convenient to use is 
(13.6) Ry as = V rub из + тиф. 
Substitution of ти = .073, ry=.160, Б*л. = —.0084, and 
жел = .165 as already found, yields Ry.» = V.0256 = .16 which 
agrees with the result previously obtained by more laborious com- 
putation. 

Partition of the Sum of Squares. Reference to the lower 
portion of Table 13.2 shows the following sums of squares: 
Sum of squaresof regressed values about У, 2(Y—Y)?= 10.555 
Sum of squares of residual errors, Z(Y — Y) = 400.995 

411.550 
Total = Sum of squares of scores about У, Х(У- Y)? = 411.6 


This partition of the total sum of squares is reminiscent of 
similar relations encountered in Chapters 9 and 10. It may be 
described by the formula 


(13.7) Z(Y - Y)? = №, 55 (У - F) + (1 - В, в) (У - Y)? 
For these data 
z(Y- Y) 


Ш 


(.0256)(411.6) + (1 — .0256) (411.6) 
= 10.54 + 401.06 


Except for rounding errors, the result of substituting ВЮ,» and 
E(Y – F)? in Formula (13.7) agrees with the sums of squares read 
from Table 13.2. The partition of the sum of squares may now 
be interpreted in two ways. 
A. Interpretation as to effectiveness of prediction for observed 
group. From Formula (13.7) it appears that for this group 
R? лг = proportion of the sum of squares of midterm marks 
which can be ascribed to variation in prognostic 
score 
= proportion of variation (measured as sum of 
Squares) of midterm marks which might be elimi- 
nated if all cases were selected to have the same 
prognostic test score 
1 — E,» = proportion of variation in midterm marks which 
is independent of variation in prognostic test 


Partition of the Sum of Squares - 323 


scores and so must be ascribed to other sources 
of variation 

= proportion of variation in midterm marks which 
would remain even in a group uniform as to 
prognostic test score. 

The question of whether the regression equation does or does 
not provide effective prediction may now be answered in terms 
of a value judgment as to how much unexplained variation can 
be tolerated in the particular circumstances in which one is work- 
ing. Anyone who thinks realistically about achievement of 
students on a college course would think of many sources of 
variation unrelated to scores on any prognostic test, as, for 
example, differences in motivation and interest, in study habits, 
in time available for study, in other abilities such as mathematical 
background and general level of intelligence not measured by 
these tests, in personal vicissitudes during the term. He would 
also think of sources of measurement error affecting scores on the 
criterion and on the prognostic tests such as differences in pre- 
vious experience of the student with such tests, his state of mind 
and body on the day the tests were given, the brevity of the tests 
and the particular selection of material in them, and so on. All 
such sources of error contribute to the proportion of variation 
1—R%,.». The person using the regression equation must decide 
whether the reduction in variation represented as F?,.12 represents 
an increase of information sufficient to justify the time required 
for giving the tests, scoring them, and analyzing results. For 
the present data of course it must be presumed that inclusion of 
the third prognostic test in the regression equation, as is done 
later in this chapter, will improve the predictions. For the ten 
cases considered here, the proportional reduction in variance of 
.0256 is too slight to be worth the trouble of giving and scoring 
the tests. However, it will presently be seen that for 98 cases 
the multiple correlation is much higher, in fact .66. This sample, 
which was actually drawn at random, happened to be an extreme 
deviate. 

B. Interpretation in terms of sampling significance. The par- 
tition of the sum of squares leads to a test of significance for the 
multiple correlation coefficient. The sum of squares of regressed 


РЕ ONG de is distributed as x? with 
: н 


om. The sum of squares of residual errors 


values 
2 degrees of freed 


324 - Multiple Regression and Correlation 
z(Y-Yy 
Co 


а is distributed as x? with 


= (1 - Rh) 


М — 8 degrees of freedom. The two sums of squares are inde- 
pendently distributed. Under the hypothesis that p,» = 0, the 


m О ква x — " E are estimates of the same 


two ratios 


variance o°. Therefore the quotient 
2 _ У pr и J А 3 
gaa). E. S27 eee R)EY-Y? № N-3 


N-3 T1-m" 2 

has the F distribution with n; = 2 and ж» = № — 3. This expres- 
sion gives a test of the hypothesis that in the population the 
multiple correlation coefficient is zero. 


For the 10 cases of Table 13.2, F = 


.0256 7 
.9744 2 
apparent from interpretation A that the two tests under consider- 
ation — difficult reading and artificial language — did not furnish 
a usable prediction of midterm mark for the 10 cases observed and 
it is apparent by interpretation B that the sample of 10 cases does 
not contradict the hypothesis that in the population there is no 
correlation between midterm mark and a weighted combination 
of the two tests. The size of the sample affects the second inter- 
pretation but not the first. 
If there had been k predictor variables instead of 2 the ratio 


=.09. It is 


(13.9) т Ud 


would be distributed as F with n; = k and n = N — k — 1. 

The Normal Equations. In order to generalize the regression 
equation with two predictors to a regression equation with any 
number of predietors it will be helpful to explore the rationale 
more carefully. Such an equation with 5 predictors may be writ- 
ten as 


(13.10) Yuus = Ay us + и susXa + Dy aas Xa + bys 3 
+ Баз + bys.1284X5 


(13.11) when Ay 12345 =Y- би aus Ха E Бли e бил» 

= ва xs bus 19245 
The pattern of subscripts is clear and can be easily adapted to 
any number of variables. As before, A and all the b’s are unknown 
and must be computed from the group data, while all the X's are 


The Normal Equations · 325 


known but vary from individual to individual. The criterion for 
obtaining the b's is that the sum of the squares of the errors of 
estimate, that is Z(Y — #)?, shall be made as small as possible. 
If values of the b’s are chosen so as to make this residual sum of 
squares a minimum those same b's will make the correlation 
between observed Y and estimated Y a maximum. Students 
familiar with the calculus will know the customary mathematical 
procedure for making the sum of squares a minimum. Others 
must take the end product for granted. That end product is a set 
of simultaneous linear equations which are called normal equations. 
(The term “normal” here has no reference whatever to the normal 
curve.) There are as many such equations in one problem as there 
are unknown b's, For a 5-predictor problem the set of normal 
equations is 


(13.12) 

D* 12345 + ТБ уз лаа + 7130" us anas + Tib y4.1235 + TisD ys 1234 = Ти 
Tub и зв + б*%узлзв + Таз ув ав + тфу дзь + Tos D us лоза = Туз 
Tis вз + Тоб иль + вл + Tib a 1236 + 7350" us 1234 = Туз 
тиф 2345 + Tub“ o aas + Таб ya eas + b* ал036 + таб бло = Ти 
1150" i osa + Tog D уз 1845 + 1350 ys 1245 + 7450 1235 + b*y5.1284 = Tys 


It should be noted that the unknowns in these equations are b*’s 
but the b’s can be obtained from them by multiplying by the ratio 


of standard deviations, 
(13.13) Dys.1285 = b yams 3 


and similarly for the other b’s. All r’s are known, having been 
computed from the observed data. The correlation of every 
variable with every other variable is required. 

Numerical values for the b*’s can be obtained either by solving 
the normal equations symbolically and substituting the known 
values of the r's in the symbolic solutions or by substituting the 
numerical values of the r's directly in the normal equations and 
solving the resulting equations for the Б*в. The former method 
produces a set of formulas for the b*’s, and when there are only 
two or three predictor variables it is the simpler method to use. 
It has already been illustrated for the data of Table 13.2. When 
there are many predictor variables the formulas for the b*'s 
become complicated and substitution in them involves a large 
number of steps. The solution can then be obtained much more 


326 - Multiple Regression and Correlation 


economically by substituting the 7’s in the normal equations and 
employing an efficient routine for solution of the resulting numer- 
ical equations. Such a routine will be developed in later sections 
of this chapter. 

No matter which method is employed for obtaining the numer- 
ical values of the b*'s, the multiple correlation can be found by 
the formula 


(13.14) 


Ry suas = V Tab зав + Туәб* үә ль + TyaD уз as F Тубу лезь F Tus ув лз 
This formula is readily generalized to any number of variables. 


EXERCISE 13.1 


1. For the cases in Table 13.1 the data of Table 13.3 were obtained. 
By substitution in Formulas (13.1), (13.2), and (13.3) obtain the regression 
equation to estimate midterm test scores from scores on the two prognostic 
tests. Compare it with the equation obtained from the 10 cases of Table 
13.2. 

2. By substitution in Formula (13.6) obtain the multiple correlation 
Ry... and compare it with the value obtained from the 10 cases of Table 13.2. 

3. Test the hypothesis that py.12=0. Why is that hypothesis so much 
less credible in the light of the data from all 98 cases than in the light 
of the data from 10 cases? 

4. Suppose a criterion variable Y is to be estimated from three predictor 
variables named Хз, Xs and Ху. (a) Write a regression equation similar 
to equation (13.10) with subseripts properly placed. Note that the order 
in which the three predictors are arranged is unimportant and that there- 
fore the order of secondary subscripts is unimportant. (b) Write the three 
normal equations similar to those in Formula (13.12). (c) Write a formula 
for №, зт similar to Formula (13.14) 


5. If a criterion trait Y is to be estimated from two other traits X4 
and Xs, write out with appropriate subscripts the formulas for 


(a) 5%. (е) A = constant term in the regression equation 
(b) шл (f) В, 

(с) bus (g) The normal equations 

(d) bua 


6. І a criterion trait X; is to be estimated from traits X; and X, write 
out with appropriate subscripts the formulas for 


(a) 0*5; (с) biz (е) A (g) The 3 normal equations 
(b) b*i7.3 (d) biz. (f) Riar 


The Doolitle Method of Solving the Normal Equations by 
Successive Elimination. If there are more than two predictors 


The Doolittle Method · 327 


in the regression equation, it is a laborious process to compute 
the regression coefficients by substituting the numerical values 
of the 7’s in formulas obtained by solving the normal equations 
symbolically, as was done in the preceding section for the case of 
two predictors. With more than two predictors it is easier to 
substitute the numerical values of the 7's in the normal equations 
themselves, and then to solve the resulting equations. There are 
several commonly used methods for solving first degree equations, 
and the normal equations are of first degree. One of the most 
economical of such methods, is named for M. H. Doolittle. He im- 
proved on the method first used by Gauss and published his method 
in the Report for 1878 of the U.S. Coast and Geodetic Survey’. 

This method simplifies computation by taking advantage of 
certain symmetry relations among the correlations namely that 

Т, = Toy, Tis = Та, etc. 

The Doolittle Method will be developed in some detail for the 
case in which there are only three predictors in order that the 
student may understand its rationale. For this purpose we may 
use the three prognostic tests of Table 13.1 as a basis for predicting 
midterm test scores. The pertinent data for all 98 cases are 
presented in Table 13.3. The three normal equations are 

b* os + Tisb из + тузЁ* ys 2 = Ти 
(18.15) тиб“ лз + Була + Tab sas = Te 
Tisb 2з + 7230" yos + b* 512 = Тиз 


TABLE 13.3 Correlations of Midterm Marks and Scores on Three Prognostic 
Tests for Students in a First Course in Statistical Method. 


Correlation 
Test Mean 8 x X: Хз Y 
X, Difficult reading 35.69 7.68 321 ATT .357 
X, Artificial language 46.43 11.03 321 .539 .620 
X, Arithmetic-Algebra 31.38 7.28 477 .539 518 
Y Midterm 52.46 9.33 357 .620 518 


When the computed values of the r’s are substituted in Equations 
(13.15) and the results are simplified by writing — ш = и, 
— us = "р and — Us = м, and both sides of the resulting 


equations are multiplied by —1, we have the new set of equations 
(1) т + .321us + АТ = — .357 

(13.16) (2) .321u + — ш+.539щ = — .620 
(3) .477ш + .539и + из = — .518 


328 - Multiple Regression and Correlation 


The methods of solution of simultaneous equations studied in 
high school algebra are applicable, but unless steps are set down 
in an orderly routine there will be much waste effort. The student 
should study carefully the routine set forth in Table 13.4 until 


TABLE 13.4 Detailed Solution of Equations (13.16) 


THE FORWARD SOLUTION 


Line Step 


[1.1] Write equation 1 wm + .321 w + 477 us = — 857 
[1.2] [11]. (- 1) = щш — 321 w — .477 щ= 357 
[2.1] Write equation 2 3821 ш + из + .589 us = — .620 
[22] [1.1] (—.321) — 3214; — .103 u, — .158 = 4115 
[23] [2.1] + [2.2] 0 + .897 и: + .386 us = — .505 
[24] [2.3] + (— .897) 0 — из — 480% = .563 
[3.1] Write equation 3 ATT ш + .539 w + Us = — .518 
[3.2] [1.1]: (— .477) — .477 ш — .153 u — .228 = 170 
[3.3] [2.3]: (— .430) 0 — 386 u — .166 u= .217 
[34] [8.1] + [3.2] + [3.3] 0 + 0 +.606щ = — .131 
[3.5] [3.4] + (— .606) 0 + 0 = u= 216 


THE BACK SOLUTION 


[1.18] Enter [3.5] – u= 216 
[2.18] Enter [2.4] = us — 430 us = .563 
[2.28] [1.18]. (— 430) 430 us = — .093 
[2.38] [2.18] + [2.28] = wt 0 = 4470 
[3.18] Enter [1.2] —  w-—.321u—.477u- 357 
[328] [1.1B]-(— 477) 477 us = 
[3.38] [2.3B]-(—.321) 3214-- 0 = 
[3.48] [3.18] + [3.28] + [3.38] — wc 0+ 0 = 


he understands it fully and then should compare it with Table 13.5 
so that he may understand how to generalize that abbreviated 
solution to a pattern accommodating any number of predictor 
variables, In Table 13.5 letters representing variables wm, uz 
and us have been omitted and only their coefficients printed. 
Certain unnecessary terms have also been omitted as well as signs 
of operation and equality. 

The solution proceeds in cycles, there being as many cycles 
as there are unknowns, and one unknown being eliminated in each 
cycle. The lines have been numbered in such a way as to indicate 
both the cycle and the row within the cycle, Thus [3.2] indicates 
the 2d row in eycle 3. 


The Doolittle Method · 329 


The forward solution proceeds by successive elimination of 
one unknown after another until a value is found for one of the 
unknowns. Then the back solution consists of taking the values 


TABLE 13.5 Abbreviated Solution of Equations (13.16) 


THE FORWARD SOLUTION 


Line Step ра 36 T Y | Check 

[1.1] Insert ris 1| 32|  477| —357| 144 
E12] 13:5) —1| —321| —.477| 357 | —1441 
[2.1] Insert rz;’s .321 1.000 .589 | —.620 1.240 
[2.2] [1.1]. (—.321) —.103 | —.153| .115| —.463 
[2.3] [2.1]-[2.2] .897|  .386| —.505| 777 
[2.4] [2.3] (—.897) —1.000 | —.430 | .563| —.866 


[3.1] Insert тзв ATT .539 1.000 | —.518 1.498 


[3.2] [1.1]: (—.477) —298| .170| —.687 
[3.3] [2.3]: (—.430) —.166| 217 | —.834 
[3.4] [3.1]--[3.2]--[3.3] .606 | —131| .477 
[3.5] [3.4]-- (—.606) —1.000 —.787 


THE BACK SOLUTION 


[1.15] Enter [8.5] 


[2.18] Enter [2.4] —1000 | —430| .563| —.866 
[2.28] 1.1B]- (— 430) 430 | -.093 | (338 
[2.35] [2.1B]+2.2B — 1.000 470 | —.528 
[3.18] Enter [1.2] —1.000| —.321| —.477| .357 | —1441 
[2.28] [1.1B]- (— 477) 477 | —:103| 375 


[3.38] [2.38]: (-.321) 321 —.152|  .109 
[3.48] [3.18]--[3.2B]--[3.3B]| — 1.000 102 | —.897 


Бала = .216, b*o = .470, Б.з = .102 
Ryan = Ути + турла + туф*узлз = .668 


of those unknowns which have already been found and substitut- 
ing them in equations to obtain values of the remaining unknowns. 

Each cycle may be thought of as having a first line, a last line, 
a next-to-the-last line, and a set of "other" lines. (The first 
cycle is an exception, having only 2 lines.) The first step in each 
cycle of the forward solution is to record the values of the appropri- 
ate r's. In the third cycle these are the correlations of X; with 
the other variables, and similarly in a problem involving many 
predictors the correlations with X, are recorded on the first line 


330 - Multiple Regression and Correlation 


of cycle К. The column headed X; receives the correlations of X; 
with other variables and in general in a problem in which there 
are many predictors the column headed X; receives correlations 
of Хь with other variables. The criterion variable is placed at 
the right of the predictors. This step is equivalent to recording 
the original equations in the detailed solution. 

Following the line on which r’s are entered, each cycle has one 
line obtained from each of the preceding cycles, as worked out in 
detail in Table 13.6 for cycle 5. Thus in a problem with many 


TABLE 13.6 Detailed Steps for Cycle 5 in Forward Solution 


Enter the correlation of X; with each of the other variables, placing rs 

in column Xz, ть’ in column X;, ete. Enter the sum of all these correla- 

tions in the check column. 

[5.2] Multiply line [1.1] in cycle 1 by the entry in column X; and line [1.2]. 

Do not record any values to the left of column X;. Multiply the entry 

in check column of [1.1] by same multiplier from line [1.2] and enter 
result in check column of [5.2]. 

[5.3] Multiply line [2.3] in cycle 2 (including the entry in check column) by the 

entry in column X; and line [2.4], but do not record any values to the left 
of column X;. 

[5.4] Multiply line [3.4] in cycle 3 (including the entry in check column) by 

the entry in column X; and line [3.5]. 

[5.5] Multiply line [4.5] in cycle 4 (including the entry in check column) by 

the entry in column X; and line [4.6]. 

[5.0] Add lines [5.17 to [5.5]. 

[5.7] Divide line [5.6] by the first entry in that line, that is, the entry in column 
X;, after having first changed the sign of the divisor. The first entry in 
[5.7] should be — 1.000. The entry now standing in the check column 

Should be equal to the sum of the other entries in line [5.7]. 


predictors, cycle k would have k — 1 such lines. Each of these 
lines is obtained by multiplying the entries in the next-to-the-last 
line in one of the preceding cycles by an entry in the last line of 
that eycle. Thus line [1.1] is multiplied by an entry in [1.2] and 
line[2.3] by an entry in [2.4]. Ina problem with many predictors 
this would be continued until all the preceding cycles had been 
covered. Each of the multiplying numbers is taken from column 
Xx when one is working in cycle k and entries to the left of column 
X; are not recorded. This is the equivalent of multiplying the 
coefficients of the unknowns in a particular equation by a pertinent 
or helpful constant so that when equations are added one unknown 
will disappear. 


Fisher Modification of the Doolittle Method . 331 


The next-to-the-last line in each cycle is the sum of all the pre- 
ceding lines in that cycle. Examination of the detailed solution will 
show that at this step one unknown is eliminated from the equations. 

The last line in each cycle is obtained by multiplying each 
entry in the preceding line by the negative reciprocal of the first 
entry. This step makes the coefficient of the leading term —1, as 
it is in the detailed solution. 

In the back solution the first line of each cycle is the last line of a 
cycle in the forward solution, the cycles being taken in reverse order. 

Checks. Substitution of the computed values of the standard 
regression coefficients, the b*’s, in the normal equations provides 
check оп all b*s at once. For the data under consideration sub- 
stitution in the equations 

b* эз + Tb aas + Tb 3.12 = Ти 
Tb* из + Бл + TagD* ys 12 = Туз 
тазб*л оз + Tos as + b* 3.12 = Туз 
would yield the following check: 
.102 + (.321) (470) + (.477)(.216) = .3559 while ги = .357 
(.821) (.102) + (470) + (.539) (.216) = .6192 while т, = .620 
(.477) (.102) + (.539) (.470) + (.216) = .5180 while ry; = .518 

Most experienced computers like to check their work step by 
step, so that if a mistake is made it may be caught immediately. 
For this purpose a check column is added at the right of the work- 
sheet. For the first line in each cycle, the entry in this column is 
the sum of the r’s entered on that line. All other entries in the 
column are obtained by the same set of directions which apply to 
the other columns. The check consists of ascertaining that the 
entry in the check column is equal to the sum of the other entries 
for the last line and for the next-to-the-last line in each cycle. 
In general, that relation will not hold for the other lines in the 
forward solution because of the omission of certain terms. It 
holds for all lines in the back solution. Even if the check column 
indicates no error, the check by substitution in the normal equa- 
tions should be made because the entries in the check column do 
not provide an infallible check. 

Fisher Modification of the Doolittle Method. The procedures 
described in the preceding section serve to obtain a multiple 
regression equation, and a multiple correlation coefficient and to 
test the significance of that correlation coefficient. To test the 
significance of the b*'s in the regression equation another approach 


332 - Multiple Regression and Correlation 


is necessary. The method to be described in this section will 
provide the standard error of a coefficient of partial regression and 
will thus make it possible (1) to test hypotheses about such 
coefficients, (2) to make an interval estimate of the population 
regression coefficient, and (3) to predict several different criterion 
variables from the same set of predictors. 

The discussion will be carried out for the case of three predictors 
and illustrated by the data already used to predict midterm test 
scores from three prognostic tests. Here we shall (1) test hypoth- 
eses concerning the regression coefficients in the population, 
(2) make interval estimates concerning them, and (3) obtain two 
regression equations from the same predictors, one with midterm 
test scores as criterion and one with final examination grades as 
criterion, For these purposes certain supplementary values are 
needed. 

Two explanations will now be offered. Explanation A should 
be helpful for persons who have studied matrix algebra and 
incomprehensible to most others. The latter need not look at 
it but may go on at once to explanation B. Persons who can read 
explanation A will not need explanation B. Those who cannot 
read explanation A will probably have to take the routine more 
or less on faith and need not strain to understand the underlying 
rationale. 


А. Explanation in terms of matrices. The normal Equations 
(13.15) may be written in matrix notation as 


1 rm n b* эз Ти 
(13.17) те 1 тз Бал р = 4 Te 
Tu Te 1 b* 12 Tys 


When expanded, this becomes Equations (13.21) of explanation B. 
If both sides are premultiplied by the inverse of the matrix of r’s, 
we have 


(13.18) 
1 fm тв): [1 те тв b* os l т т} [Ти 
то 1 Tos Ti 1 7з b* олз = то 1 т Ty2 
їз Tos 1 Tis Ta 1 b* s Tu Ta 1 Тиз 
Hence 


2031010 и Си Сз Cis Ти 
(13.19) Diei. © DF yeas р = 4 Cn Cn Cn Ty, 
0 0 1 b* 3.19 Ca Сз Cs Tys 


Fisher Modification of the Doolittle Method · 333 


When expanded (13.19) becomes Equations (13.22) of explanation 
B. The goal now is to obtain the values of the C’s which are the 
elements of the inverse matrix. The product of the matrix of 
r's by its inverse, the matrix of C's is the identity matrix 


1 Tm Ti Cu Cz Cs il (0s «(0 
(13.20) Е 1 M [5 C» A = fo 1 o} 
їз Ta 1 Сы С» Crs ety a 
When expanded (13.20) becomes Equation (13.23) of explanation 
B. There are nine equations in (13.20) and 9 unknown C’s to be 
computed. However, because of the symmetrical nature of the 
matrix of r’s, the matrix of C’s is also symmetrical and С = Cn, 
Cis = Cu, Cos = С». The C's which are elements of the inverse 
matrix are sometimes called elements of the inverse solution, or from 
relations (13.19), multipliers or inverse multipliers. They will be 
called multipliers in this book. 
In section C a routine for obtaining these C's will be developed. 
B. Explanation without matrices. The normal equations 
1. 6% лаз + табл + Tib as = Ги 
Tib эз + 1+ 5,25 + Tib usan = Ту? 
130% yis + 7230" уз лз + 15 b* зла = Т 
might be thought of аз equations to obtain the values ry, y, Тиз 
from the b*s. However what is usually needed are the inverse 
expressions for the b"'s in terms of the correlations Ти, "ys Tvs 
and the correlations among the predictors riz, Ti, Tes. Equations 
(13.21) may be solved for the b*’s but when there are more than 
two predictors the symbolie solutions are complieated. These 
solutions may be indicated by the set of inverse equations 
b* 123 = Сити + Corp + Curys 
(18.22) Ьа = Cura + Cota + Сътув 
Фе = Сати + Cary + Cost ys 
where each C represents an expression involving the correlations 
These expressions for the C’s become more and more 
complex as the number of predictor variables increases. Certain 
convenient relations among the C's and 7’s can be readily estab- 
lished by matrix algebra but will probably have to be taken on 
faith by the person who does not understand that subject. These 


relations are 


(13.21) 


Ti, T13, T23. 


1. Cun + тыб» + твСи = 1 
Ti + 1+ Си + Ta Cn = 0 
тзСи + таза + 1: Си = 0 


334 - Multiple Regression and Correlation 


1. Ci + тС + Tis Cs = 0 
(13.23) TC + 1. О» + тС = 1 
тазСиз + Tos 22 + 1- C32 = 0 


1. Cis + тС + iC = 0 
Tig ss + 1- Сз + ossa = 0 
тазСаз + T2sC'23 + 1- Css = 1 


The nine Equations (13.23) can be compactly written as 


1. Си + гьСь + risCa = 1, 0, 0 
(13.24) тъСа + 1. Ci + TC 5 = 0, 1, 0 
TiC a + тС + 1-Св = 0, 0, 1 


The C’s may be called elements of the inverse solution ог from the 
relations (13.23) multipliers, or inverse multipliers. They will be 
called multipliers in this book. 

C. Routine for computing the multipliers. The goal is to obtain 
values of the C's which may be substituted in Equations (13.22) 
to obtain the b*’s. The computational form is set up in the manner 
already used for the abbreviated solution on page 329 except that 
the column for criterion Y is omitted and in its place are several 
columns labeled Сл, С», Cj. In а problem having m predictor 
variables there would be m such columns. For a 3-predictor 
problem the column labeled Ся will eventually lead to the values 
Cy, Cx, and Cy, the column labeled C; will lead to the values 
Cx, С» and Ci», and the column labeled Сл will lead to Css, Cas 
and Cis. 

The numbers to be entered on the first lines of the forward 
solution are, for m predictors, as follows: 


Line Xi X: P CUP CS C; Cj боб 
[1.1] EE Tee rie ale Ole OA: 0 
[2.1] complete rn KO D 0..0 
[3.1] foe л ылы 0 ОГ... 0 


=. 


[ШШЩ tim sel 0. 107 о 
The succeeding steps are exactly like those of the previous solu- 
tion. Values of the C's will appear on the last lines of various 
cycles in the columns headed Ол, Cx, Cis. These have been taken 
out and set down in a separate tabulation where their symmetry 
can be noted. "That symmetry provides one check on the compu- 
tations. The b*s are now obtained by substituting the C's in 


Fisher Modification of the Doolittle Method . 335 


Lg— 


Ier 
62° 
867— 


658: — LEE S 
877 9% — 
18b— 686° 

0 0 


0 000°T— 
TEE 


Tee — | 0001— 


ЮР — 
99c 
£19'— 


OIZ— OGFTI 
0IZ-— S08 
0 4493 


oer 
oer — 


000'1— 


000'1— 


619° — 


0901 OTL’ — 


000°T — 


NOILAOIOS HOVE 


ogot  OIZ— 
000'1— OEF 

0 og 

0 0 
0001— 0 


000'1— 
909* 
99r — 
8с — 
000'T 


STITT 

000т— 
0 

000'1— 


ost — 
988 
За 
666” 


000'1— 
108 
#07 — 
000°T 


ж— 
uy 


Icg&— 
TEE 


[48] + 
Cave] + Lave] Lave] 
(ze —) [aez] laee] 
(ыз — Lari Caze] 
gI wg [are] 


gcc + [arz] (921 
(087 —)-Lar 1] [495] 
[r2] ug Cate] 
[ee] зэзая [gT T] 


pcc) ed 
Eee] + [<] + [т] 
(oer —)-[£c] 
(му —)Lpru 
*j9p гәза 
£g Еа 
[cc] + Lrc] 
ace —)-Err] 
"yup зэзая 
(1—)-Err] 
езер 1930 


[eg] 
Dre] 
[ee] 
[c£] 
Cre] 
[rz] 
[ee] 
[cc] 
Erz] 
[c1] 
[гт] 


x 


aX. 


dag әш 


ѕләцащаүү ay; бица@ шо) 10} элпрэ2014 эщооа 


NOLLOTIOS GUvAO,[ 


52943 ZEL ЭЛЗУЕ 


Values of the C's Obtained from Fisher-Doolittle Solution 


Cis 


Cj 
Съ = - .710 


Сл 
Ca = - .559 


Ca = — 118 


Line 
[1.18] 


1.650 


Съ = — .710 
Сз = — 559 


Cs = 


1.420 


С» = 
Cy =- .117 


1.305 


Cu = 


[2.38] 
[3.48] 


336 . Multiple Regression and Correlation 


Equations (13.22). These are seen to agree with the previously 
computed values of the b*’s except for small differences in the last 
digit due to rounding errors. The complete solution appears on 
page 335. 
Valves of b*'s to Predict Midterm Test Scores 
Obtained by substituting C’s in (13.22) ede 


Баз = 1.305(.357) — .117(.620) — .559(.518) = .104 .102 

Фев = — .117(.357) + 1.420(.620) — .710(.518) = .471 470 

лв = — .559(.357) — .710(.620) + 1.650(.518) = 215 .216 

bis = Мин = 124; Бан = 898; Бала = 277; 
A= Y — вых, — БХ — bys.2Xs = 20.86 
Then У = 20.86 + .124Х, + .398Х. + .277 X; 

Regression Equation for a Different Criterion and the Same 
Predictors. The same set of C's can now be used to obtain the 
regression equation to predict a different criterion variable, if 
the correlations of that criterion with the predictors are available. 
From the data of Table 13.1, the correlations of the final examina- 
tion with each of the three predictors can be found. Let final 
examination scores be designated Z. Then 

Та = .825 та = .508 та = .502 

8, = 9.47 Z = 49.15 

5% = 1.305(.325) — .117(.508) — .559(.502) = .084 
b*515 = — .117(.325) + 1.420(.508) – .710(.502) = .327 
кла = — .559(.325) — .710(.508) + 1.650(.502) = .286 
Check: Б*,,.эз + тор* лз + тазб* в 2 = .3254 and та = .325 
Тоба H b*e + тьб* ло = 5081 and т» = .508 
1130" 1.23 + зб оз + Бал = .5023 and т = .502 

Then В? дз = тађ*а зз + Tab aas + Тзб* зло = .3370 

Ran = V.3370 = .58 


To obtain the regression vo we need 
9.47 


b; id b* 2! яда =. 

1.23 1. вх = .084 7.68 104 
baas = b* ал = mdi .981 
ba as = 0*5. ng = .286 d = .372 


А = Z ~ baa Xi — baasXs — Бае = 20.77 
Z = 20.17 + .104Х, + .281Х, + .373Х: 


Tests of Significance for Partial Regression Coefficients - 337 


Tests of Significance for Partial Regression Coefficients. For 
this purpose we shall use a statistical model analogous to the one 
used in Chapter 10 for testing hypotheses about regression coeffi- 
cients in problems with only a single predictor variable. General- 
ization will be made to a subset of samples in which the values of 
the predictors are exactly the same as in the observed sample. 
It will be assumed that values of the criterion variable Y vary 
from sample to sample with mean depending on the predictors 
and described for three predictors by the equation 


(13.25) Е(У) = æ + Виз Х! + Виз Хз + Виза Х 


The values of У fluctuate around their expected value, there 
being a normal distribution of Y’s for each different combination 
of Ху, Xs and Ху, and all such Y distributions having the same 
variance. 

The fs appearing in Formula (13.25) are the population param- 
eters for which the b’s are sample estimates. We shall use the 
symbol В* as the population parameter corresponding to the sample 
b*. The hypothesis that 8* = 0 (hence В = 0), may be tested 
directly by use of b*. The computation of confidence intervals 
can be made more simply for the В’з than for the 8*'s. 

Symbolically the relation between the B*’s and the b*'s is 
(13.26) 


b*ann = Е(б* а), Вал = ЕО), and Вл = (0%) 
Then the statistics 


(13.27) 
b* os v B* 2s b* 213 хр: BY y2.13 " and b* 13.12 pu B* a2 
ü-RLun ya = HEY JG Bia) Cu 
N-4 N-4 N-4 


all have *Student/s" Distribution with М — 4 degrees of freedom. 
If m predictors were used in the equation, N — 4 would be replaced 
by N — m — 1 and the statistic would have N — m — 1 degrees of 
freedom. 

Returning now to the probl 
from the three prognostic tests, 
each of the three regression coefficients separately) the hy 3 
that it is zero in a hypothetical population of students of which 
these 98 students may be considered a random sample. The 
regression equation has already been obtained in the two forms 


em of predicting midterm score 
it will be of interest to test (for 
pothesis 


338 - Multiple Regression and Correlation 


ОЕ Xj) 
ЕЕ 81 i 3 
and Y = .124X; + .398Х, + .277 X; + 20.86 
The multiple correlation coefficient related to these equations 
has been found to be R4» = .663 and this has been shown to be 
significantly different from zero. The multipliers 
Cn = 1.305, С» = 1.420, Cys = 1.650 


have been found. 


+ 470 2-3) G (Xs — X2 
82 " 88° „ 


To test the three null hypotheses the following statistics are 


computed, each distributed as ¢ with N — m — 1 = 98 – 3 — 1 = 94 
degrees of freedom: 


ene) 102 = 1:17 

(1 — АС (.5604) (1.305) 
NSA 94 

b* 243 — 0 470 — 5.11 

(1 — R2)Cx (.5604) (1.420) d 
N-4 94 

banzi .216 = 218 

(1 — В?) С (.5604) (1.650) ` 
N-4 94 


Interval estimates for these regression coefficients may be 
made by the method previously used in making an interval 
estimate for the mean and for other variables having a normal 
distribution. The standard error of 5,15; is 


(13.28) sin = M 2 а), о E 


$1 
To state the formula in more general terms we may for convenience 
drop the secondary subscripts and merely use R for the coefficient 
of multiple correlation and б, for the partial regression coefficient 


of Y on Ху. If there are m predictor variables, the standard error 
of by is 


(13.29) ња = \ iS = Еа И 
ЕЕ 


Sk 


Then the inequality 
are bye — By 


toa 
$5, S а-а 


seeps T 


Elimination of One Predictor - 339 


will produce an interval estimate for Ву with confidence coefficient 
1-а: 
(13.30) 

L- А) Сьь s, | /(1 — Е?) С, 

b к Q = Сы в, LU kk | $j 
dui d rom cun a nos ile N-m-1 т 
Now let us apply this procedure to obtain a confidence interval 
for each of the three 7з in the data under consideration. If a 
95% confidence interval is decided upon, а = .05 and tsr = 1.96 


may be read from a table of normal probability as there are 94 
degrees of freedom. The interval estimates are: 


— 085 < Вш.зз < .328 
.245 < Вуз < .550 
.026 < В.» < .526 


Elimination of One Predictor from the Regression Equation. 
Either the test of the null hypothesis or the interval estimate for 
the population values of the regression coefficients would produce 
a lively suspicion that Xi, the reading test, may not be making 
any real contribution to the prediction of success in first term 
statistics since the observed b* = .102 might quite possibly be a 
chance departure from 8* = 0. The other two measures must be 
considered to be making a significant contribution to the predict- 
tion. Therefore it would be wise to make a search for a more 
useful substitute or, if testing time is at a premium, to consider 
omitting it altogether. (A passing comment might be in order, 
however, to the effect that while the reading test has made very 


' Ве contribution to the general prediction of marks in the 


statistics course, individual students have often expressed appreci- 
ation of the opportunity to discover a weakness by its aid. Some 
students who have acquired the habit of reading very rapidly 
need to learn that in this subject one must read more deliberately, 
must weigh and compare and reread and reflect.) 

How much would the predictive value of the regression equa- 
tion for these 98 students be reduced by dropping variable X;? 
For three predictors, the error sum of squares is 

(1— R%y.123)2(¥ — Y) 
and for two predictors it will be (1 — А?) 
relation between these is 


(Y- Y). The 


(0% з)? 


(18.31) 1- в =1- Раз + СЯ 


340 - Multiple Regression and Correlation 


or, in general, if there are m predictors and predictor X;, is dropped 
from the equation, the sum of squares for residual errors is in- 


creased by е» HUY - Y 
For X; of this problem 
d 1 ’ 
© (b*5.2))? = i355 (-102)* = .008 


A relative increase of only .008Z(Y — Y)? in the error sum of 
squares appears negligible. 


Then 1 — Е, = 1 — (.663)? + .008 = .568 
and Ry. = У 432 = .657 


Some reasons have already been suggested as to why prediction 
of achievement by means of prognostic tests can never be perfect. 
In these data 43% of the variance of midterm scores is related to 
variation in the two tests X» and Хз, 57% to variation in other 
traits. This may seem inconsequential to persons who have not 
worked with such data, but few persons who have tried to predict 
achievement scores would consider the gain in knowledge a small 
one. In interpreting the multiple correlation it should be under- 
stood that the classes were conducted in a manner designed to 
destroy the validity of the prediction. The students with the best 
prognosis were in a larger class and one which studied topics not 
covered by the other sections and not included in the class tests. 
Some of the students for whom prognosis was poor were in a very 
small class where they had more opportunity for individual 
attention, others were in a section which met four hours a week 
whereas those with better prognosis were in a large class which 
met three hours a week. 

Partial Correlation. Sometimes the computed coefficient 
of correlation between two variables is misleading because there 
is little or no intrinsic correlation between them beyond what is 
induced by their common dependence upon a third variable (or 
upon several others). One might make a very long list of traits 
which increase with age between 12 and 18 years of age, such as 
height, weight, strength of grip, vocabulary, interest in public 
affairs, money spent on wearing apparel, size of shoe, knowledge 
of public affairs, interest in the opposite sex, ability to understand 
abstract material, ete. Any two of these would almost certainly 
show positive correlation, but the correlation between money 


Partial Correlation · 341 


spent on wearing apparel and ability to understand abstract 
material would probably disappear if the effect of variation in 
age could be eliminated. In his Statistical Method, S. Florence 
remarks: 


The contention that most people die in their beds and that therefore, a bed 
is the most unhealthy of places is a familiar example of failure to apply 
partial association. Going to bed is à usual consequence of being ill and 
dying occurs after a period of illness. Both phenomena are normal results 
of illness and to obtain a scientific proof it would be necessary to select 
only cases of illness and within the universe of illnesses to see whether 
those going to bed really die in greater proportion than those not going 
to bed.5 


The data on the prediction of success in first term statistics 
do not display the striking contrasts which sometimes appear 
between zero order correlations and partial correlations. How- 
ever they may be utilized to demonstrate what the formula for 
partial correlation means. 

For the 98 cases of Table 13.1 the correlation between scores 
on the midterm test Y and scores on the arithmetic-algebra 
placement test was ту = .52. Is there a relation between these 
two traits which cannot be explained in terms of their common 
relationship to scores on the artificial language test? 

Table 13.8 shows the scores on these three tests for a random 
sample of 10 cases out of Table 13.1. It also shows the predictions 
which would be made for Y and for X; by the regression equations 


f = ВХ, + (Y — БХ) 
X; = bX: + (Х. АТ ba Xa) 


and shows the errors of estimate Y — Ў and X; — Ж. 
The correlation between these errors of estimate is 
Nu z(Y-Pf)X;-X;) _ 116.29 49 
y-Y, X:i-X: VEY A Yyz(X em рар 381.65 


This correlation, which has sometimes been called a net correlation, 
is the partial correlation denoted by the symbol тз. 
Tf both X, and X; has been used in the regression equation, the 
correlation between 
ү ў ҮҮ а. - X) - Ва - X2 
and X; = 5 = Xs m XG = ba (Xs m X) zu ъз (Xa 3 Xy 


342 - Multiple Regression and Correlation 


could have been found to be ry: = .08 


Тузл = .2 
па = AT < AS тала = .08 
Ty3.2 = 307 


TABLE 13.8 Correlation among Residual Errors for Ten Cases 
Chosen at Random from Table 13.1 


Gases 1X; 1X. ИХ О УУ X1) 


ТС e NN CHUTE 52 —40.56 
О 62% 816, 1/5084 1б 117 —18.72 
32 58 32 49 376 595 -—56 —105 58.80 
13 55 38 50 365 579 15 —7.9 —11.85 
7 7 42 64 872 590 48 5.0 24.00 
88 58 42 59 376 595 44 —5 —2.20 
4» 59 зт 62 358 5608 12 52 6.24 
TRI 60 49 7i 383 «808 " 47 104 48.88 
97 34 38 41 292 465 38  —55 —20.90 
95. 52 30 43 355 562 —55 —132 72.60 
Sum 521 355 563 3551 5631 —01  —01 116.3 
Mean 521 355 563 35.51 5631 —01  —.01 
X(dev) 616.9 284.5 8841 75.6 1814 2076 7015 
rat eiie aus AXy(Y — Y) 116.3 
3.2 = = = = —— = 30 
V3(X; — ¥)?VS(¥ – Р): V(207.6)(701.5) 
Туз — Tyatse _ 47 — (45) (52) _ 30 


Е о (.893) (.854) 


From these figures we note that variation in reading score and 
variation in artificial language score both have a considerable 
effect on the relation between algebra-arithmetic score and mid- 
term test. For corresponding data for the 98 cases, see Exercise 
13.2, question 6. 

The subscripts preceding the point are called primary sub- 
scripts; those following the point, secondary subscripts. The 
number of secondary subscripts is the order. Thus ги» is a 
second-order partial; r,s. and ти. are first-order partials; туз is 
a zero-order coefficient. 

Any partial correlation coefficient can be obtained from 
coefficients of the next lower order. Thus 
(13.32) 


Туз — Тита 


va =" м1 


Туз — Ту зэ 


М1—г% V1-r's 


тда = and тво = 


Partial Correlation - 343 


If additional secondary subscripts are annexed to every г in 
Formula (13.32), the result is the formula for a partial of higher 
order. Thus 

Туз. — Ту .2131.2 


Тузла = SoS 
vl- PyeV1l— T^g o 


(13.33) 
Tyga — Тр 17321 


Туз = — 
у1 – Tya vA Ts 7730.1 


Since the arrangement of secondary subscripts is immaterial, 
Тв = Таз. The two Formulas (13.33) are algebraically identical 
but seldom give identical results in computation because rounding 
errors take a heavy toll. To obtain a partial correlation with a 
large number of secondary subscripts it is possible to work up step 
by step from zero-order coefficients to first order, to second order, 
etc. but the amount of labor involved increases rapidly as the order 
increases. If the secondary subscripts are numerous it is usually 
easier to use the Formula 


(13.34) 712.3456 = 0 12.34560 3456 = Di .3456021.3456 


obtaining each of the b*'s by a Doolittle solution. 

The study of partial correlation suggests one reason why 
correlation coefficients are difficult to interpret. There is no rule 
which the research worker can follow to tell him whether he needs 
а zero-order correlation or a partial. He is obliged to answer that 
question by intense and critical thinking about his data. Some- 
times the zero-order r is really spurious, sometimes the partial, 
depending on the use to be made of it. The zero-order correlation 
of intelligence quotient with reading ability for all the children 
in a school system will be positive and is likely to be fairly high. 
However, if computed for children in the fourth grade only the 
correlation between intelligence quotient and reading ability is 
likely to be negative because within the single grade the brighter 
children are the younger and in the fourth grade the older children 
have a certain advantage in reading. Here the variable of “grade 
in school" has been literally Aeld constant. The effect on the 
correlation coefficient is similar to, but not identieal with, the 
effect of taking a partial in which a secondary subscript denotes 
* grade in school.” 

Test of Significance for a Partial Correlation. All the tests 
of significance for à zero-order correlation described in Chapter 10 


344 - Multiple Regression and Correlation 


may be applied to a partial if the number of degrees of freedom 
is decreased by 1 for each secondary subscript. 

Relation of Partial and Multiple Correlation Coefficients. 
Using a 4-predictor problem as illustration, the multiple R may 
be written in terms of partial r’s and a zero-order г as follows: 


1 — By эм = (1—1) (1 = 75,1) (1 — т? вла) (1 — Т? лз) 

(13.35) = (1-2) (1 — т.з) (1 — тиз) (1 — T3232) 
= (1 rh) - Риз) - rua) ras) 

ete. 
This formula does not provide a very economical computing 
routine but it furnishes some interesting comparisons which enable 
us to set a lower bound for a multiple R without computing it. 
Each of the parentheses on the right of Formula (13.35) contains 
a quantity which is smaller than 1 (except in the unusual case 
of т = 0). The product of several such quantities is smaller than 
any one of them. Therefore 


1 — Ry эм < 1 — Тиле 
and Ву 1234 > Риз 


The right-hand member of (13.35) could be written in 4! = 24 
different ways, all algebraically identical, if the various partials 
are expressed in terms of zero-order r’s. These 24 ways of writing 
the formula would involve every r having y as one of its primary 
subscripts, and each of these r's would be smaller than Ёуләм. 
However, there is no necessary relation between E,» and other 
r's for which y is not a primary subscript, as for example, 123.14 
ОГ 730.14. 

Consistency Among Coefficients. The relations between zero- 
order and partial coefficients are less definite than that which 
provides a lower bound for the multiple correlation. Occasionally 
a research worker encounters a partial coefficient which is larger 
than the zero-order coefficient with the same primary subscripts. 
For example ri; = .30, тз = .10 and ra = .80 would yield г = .37. 
More often in the data with which social scientists deal the partial 
coefficients are smaller than the related zero-order coefficients. 
Occasionally ть and тъз have opposite signs. For example for 
the r's just quoted ri; = .10 and тзг = — .245. The three coeffi- 
cients тә, 713 and rs are always related in such a way that none 
of the partials is numerically larger than 1. This is called the 
consistency relation and it amounts to the requirement that 


(13.36) 1 — rh — Т?з — т + 27га > 0 


Iterative Method of Obtaining Regression Weights - 345 


Thus ть = .80, ris = .60, 723 = — .20 are inconsistent because they 
would lead to partials larger than 1, and 


1 — rh— Т?з — Т?зз + 2777 = — .282 


EXERCISE 13.2 

1. Write the formula for each of the following partial-correlation 
coefficients in terms of zero-order coefficients: (а) res; (b) rus; (©) Tarv 

2. Write the formula for ras in terms of first-order partials having 
5 as secondary subscript. 

3. Write the formula for ris; in terms of third-order partials having 
2, 5, and 6 as secondary subscripts. 

4. Write the formula for 743.987 in terms of the related regression coeffi- 
cients. 

5. Write the formula for R3.125 in the manner of Formula (13.35). 

6. For the data for the 98 cases of Table 13.1, using the zero-order 
correlation coefficients in Table 13.3, verify the following partial coeffi- 
cients: 

Tui = 424 


Ty = .518 Tys.12 = .218 


Ty. = .278 


Iterative Method of Obtaining Regression Weights. An it- 
erative method of obtaining the b*’s for a multiple regression 
equation — usually called the Kelley-Salisbury Method **° — will 
be described. It is useful only when some small errors in the 
bs can be tolerated. When the number of predictor variables 
is small it does not save labor. Those who like to use iterative 
methods consider that when the predictor variables are numerous, 
this method is quicker than the Doolittle method. However it 
does not provide any method of testing the significance of the 
b*’s individually. It will be illustrated here from the data already 
used for the Doolittle solution. 

The Kelley-Salisbury method is based on the normal equations. 
For the present data these are 

b* 1.23 + .321b* 2.13 + АТТ за = .357 
.321b* 1.23 + Ваз + .539b*y3.12 = .620 
Аи 3 + .539b* 2.13 + b*ys.a2 = 518 

The procedure is to make a rough estimate of the values of 
the b*’s (called W; on line 2 on page 346), substitute those estimates 
in the normal equations and obtain a first approximation to the 


346 - Multiple Regression and Correlation 


correlations with the criterion (called 7; on line 3). Now, compar- 
ing the first approximations rı with the actual correlations we see 
that the approximation is close for 7,2 but is considerably too large 
for ти and ry. We shall try reducing the estimate for one or the 
other of the corresponding b*’s. Since ги shows the largest 
discrepancy we will reduce the entry for column 1. It is not neces- 
sary at this stage to seek for great precision, so we will reduce 
the estimate from .2 to .1 and will record .1 — .2 = — 1 as the 
amount of the change in column 1. It is now possible to substitute 
the new weights .1, .4 and .3 in the normal equations or merely to 
note that the estimate of rj will be changed by — .1, the estimate 
or ry by — .1(.321) and the estimate of rj; by — .1(.477), producing 
the second estimates т» recorded in row 4. The iterative process 
may be carried on until agreement between estimated 's and 
observed is close enough to satisfy the computer. 


„—————————— 


Step Estimate Change 
xi х, Хз Х, X: Xs 


1. ти; = correlation with criterion .357 .620 .518 

2. W, = first estimate of b* 2 4 3 

3. т = first approximation to т,; .4715 .6259 .6110 —.10 

4. To .3715 .5938 .5633 —.05 
5. Ts .9476 .5669 .5133 +.05 

6. T4 .3637 .6169 .5403 

7. Ws = revised estimate of b* .10 45 25 

8. ту (check) .3637 .6169 .5403 —.022 
hits 3532 .6050 .5183 +.015 

10. re .3580 .6200 .5264 —.008 
11. т .3542 .6157 .5184 .003 

12. rs .8572 .6167 .5197 

13. Ws = revised estimate of b* .103 .465 .220 

14. rs (check) .3572 .6167 .5197 .005 

15. rs .3588 .6217 .5224 —.004 
16. ^0 .3569 .6195 .5184 

17. W, = final estimate of b* 103  .470  .216 


18. b*,; = value obtained from .102  .470  .216 
Doolittle solution pre- 
sented here for com- 
parison 


REFERENCES 


1. Doolittle, M. H., “Method Employed in the Solution of Normal Equations 


and the Adjustment of a Triangulation,” U.S. Coast and Geodetic Survey 
Report, 1878, 115-120. 


10. 


References - 347 


Dwyer, P. S., “The Doolittle Technique,” Annals of Mathematical Statistics, 
12 (1941), 449-458. 


. Dwyer, P. S., “The Solution of Simultaneous Equations," Psychometrika 


6 (1941), 101-129. 


. Dwyer, P. S., Linear Computations, New York, John Wiley and Sons, 1951. 


Florence, P. S., The Statistical Method in Economics and Political Science, 
New York, Harcourt Brace, 1929. 


. Kelley, Т. L., Fundamentals of Statistics, Cambridge, Mass., Harvard Uni- 


versity Press, 1947, Chapter 12. 


. Kelley, T. L. and McNemar, Quinn, “Doolittle versus the Kelley-Salisbury 


Tteration Method for Computing Multiple Regression Coefficients," Journal 
of the American Statistical Association, 24 (1929), 164-169. 


. Kelley, Т. L. and Salisbury, Е. S., “Ап Iteration Method for Determining 


Multiple Correlation Coefficients,” Journal of the American Statistical Associa- 
lion, 21 (1926), 282-292. 


. Tolley, H. R. and Ezekiel, M., “The Doolittle Method for Solving Multiple 


Correlation Equations versus the Kelley-Salisbury Iteration Method,” Journal 
of the American Statistical Association, 22 (1927), 497-500. 

Walker, H. М. and Durost, W. N., “А Model to Aid in Teaching Partial 
Correlation,” The American Statistician, 4 (October, 1950), 5-7. 


14 Analysis of Variance with 
Two or More Variables 


of Classification 


Some of the simpler applications of analysis of vari- 
ance were considered in Chapter 9. This type of analysis may 
be applied to a great variety of complex experiments, some of 
which provide a method of studying the effects of two or more 
factors in the same experiment. As was the case in Chapters 7 
and 9 it will be necessary to distinguish between situations in 
which only one observation is made on each individual and situa- 
tions in which several observations are made on each individual. 

Two Bases of Classification with n Individuals in Each Cell. 
An experiment on the teaching of reading to backward readers is 
described by Burt and Lewis‘. A group of 48 backward readers 
was given instruction in reading by four methods called А. Alpha- 
betic, B. Kinesthetic, C. Phonic, and D. Visual. In the following 
table, the improvement achieved by each of the pupils is recorded 
as 100 times the ratio of his final to his original reading score. 
Each ratio is classified both by the method used in remedial 
instruction and by the method previously used in the school which 
the pupil attended regularly. These ratios multiplied by 100 will 
now, for the sake of simplicity, be called scores. Note that the 
scores appear in 16 groups, three scores in a group. One of the 
groups corresponds to each possible pairing of method of instruc- 
tion used in school and the method used in remedial teaching. 
The sixteen groups are customarily called subclasses. Since the 
subclasses are arranged in rows and columns it is convenient to 
speak of the groups of subclasses as rows and columns. 

The advantage of this arrangement or experimental design 
is that each remedial method is applied equally often to students 
who have been taught previously by each of the four methods. 
Thus the outcomes of remedial instruction are unaffected by 
peculiarities of prior instruction. It is important that the 12 pupils 


Two Bases of Classification · 349 


in each column be assigned randomly to the four remedial methods 
to assure that variation in individual abilities does not bias the 
results of the experiment. 

Before writing the necessary formulas it will be helpful to 
review the notation previously used in similar situations. A score 
in the subclass in the ith row and jth column will be denoted by 
the symbol Жу so that the three scores in the first row and second 
column are Xy = 114.2, Xi = 101.6 and Xi = 113.0. The mean 
of the scores in a subelass will be denoted by the symbol X; the 
mean of a row by X;, the mean of a column by X ; and the mean 
of all scores by X. The corresponding population means are pij, 
Mi, шз and p. 

The usual assumptions of normality and equality of standard 
deviation are made. All possible observations which may appear 
in a subclass are assumed to be independently and normally 
distributed about the mean, дуу, of the subclass, and each of these 
populations of observations is assumed to have the same standard 
deviation с. Thus each group of three observations in a subclass 
of Table 14.1 is a sample from a normal population with the mean 
and standard deviation indicated above. 

By use of the parameters three hypotheses can be formulated: 

1. The hypothesis that the row means are equal. This hypothesis 
may be denoted by the symbol H,. For the data in Table 14.1 
the hypothesis is 

H, : ма. = ш. = ш. = Ш. = Be 
In terms of these data the hypothesis implies that in the population 
the four remedial methods are equally effective. 

2. The hypothesis that the column means are equal. This hypoth- 
esis may be written 

Не: ил = ha = ha = Ва = Be 
In terms of the data in Table 14.1 this hypothesis implies that 
students improve equally well under remedial instruction regard- 
less of the method of teaching originally used in their schools. 

3. The hypothesis that the interaction is zero. In terms of the 
parameters this hypothesis may be written 


Hy. py = (ш. – И) + (u;— ш) + ш 
ог шу — Hi. — M.j + = 0. 
The interpretation of this hypothesis is that each subclass mean 
(и) can be found from the general mean by adding to the latter 


350. Analysis of Variance 


TABLE 14.1 Improvement Scores of Pupils Given Remedial Instruction in Reading 
Classified According to Method of Original Instruction and Method of Remedial 
Instruction* 


—————-—— 


Method of Improvement scores of pupils receiving 
remedial original instruction by a given method 
instruction A B C D Sum Mean 
A. Alphabetie 98.7 114.2 102.8 103.2 
109.4 101.6 96.1 111.5 
100.2 113.0 110.2 106.4 
Sum 308.3 328.8 309.1 321.1 1267.3 105.61 
Mean 102.77 109.6 103.03 107.03 


B. Kinesthetic 118.6 102.5 113.5 103.9 
106.0 106.7 1174 119.1 


116.1 110.4 116.8 116.2 
Sum 340.7 319.6 347.7 339.2 1347.2 112.27 
Mean 113.57 106.53 115.9 113.07 
C. Phonie 107.5 106.4 111.2 101.6 
112.6 98.4 100.8 105.5 
105.8 93.6 109.6 94.7 
Sum 325.4 298.4 321.6 301.8 1247.2 103.93 
Mean 108.47 99.47 107.2 100.6 
D. Visual 128.1 113.4 119.8 101.2 
119.0 119.2 106.6 108.0 
126.5 111.3 107.9 107.6 
Sum 373.6 343.9 334.3 316.8 1368.6 11405 
Mean 124.53 114.63 11143 105.6 
Grand Total 1348.0 1290.7 1312.7 1278.9 5230.3 
Grand Mean 112.33 107.56 109.39 106.58 108.96 


* From Burt and Lewis‘. 


the amount by which the row mean deviates from the general 
mean (и;. — и) and the amount by which the column mean deviates 
from the general mean (м.; — џи), row and column being those in 
which the subclass is located. In terms of the data in Table 14.1 
the hypothesis means that the variation from subgroup to sub- 
group may be explained by the addition of components due to the 
method of school instruetion and method of remedial instruction; 
that the effectiveness of a method of remedial instruction is the 
same regardless of what method preceded it. 

If the hypothesis H,. is true, then we say that there is no 
interaction between the row and column components or that each 
exercises an influence apart from the other. If this hypothesis 


Two Bases of Classification - 351 


is not true we say that the interaction of the components produces 
effects which cannot be explained merely by adding the row and 
column components. The reader will notice that the hypothesis 
H,, was also formulated in Chapter 9. In that chapter there was 
only one case in each subclass and it was, therefore, impossible 
to test for interaction. In this chapter we shall be concerned with 
situations where there are several cases in each subclass and a test 
for Н will be made. 

To test the three hypotheses just formulated we shall subdivide 
into components the total sum of squares of deviations of the 
individual scores from the mean of all the scores. In Table 14.2 


TABLE 14.2 Symbolic Description of Analysis of Variance in a Two-way 
Layout With Equal Numbers of Cases in the Subclasses 


Source of Degrees of Computational formula 


Sum of squares 


variation freedom for sum of squares 
2 
Total пгс — 1 Ууу Bie = XP УХУ - г 
77а Tja 
1 Ts 
Rows r—1 ny X. =Y} wy T4.— Е 
Colum —1 т У Z- Xy a Te poe 
ns с 50 an as are 
1 1 
Interaction (r—1)(e—1) n Ў z (Xj-X,-X. +X) = У T*j— us » Т, 
$ Den i 
Т? 
Шш 


1 
Within ас) УУУ УУУ lyym 
E a ЖУ М8. 


classes ni ij 


aE 


this subdivision will be shown, together with the degrees of 
freedom appropriate to each component and a further subdivi- 
sion of the component sums of squares to simplify computations. 
We shall suppose in the table that there are r rows, c columns and 
n cases in each subclass. Hence the total number of cases is 
ттс: 

Tt will be convenient to use ће notation: Ти = the sum of the 
scores in the subsample which is located in the ith row and jth 
column. T; is the sum of all the scores in the ith row and Т.; 
the sum in the jth column. T is the sum of all the scores. 

The computational formulas in Table 14.2 will now be applied 


352. Analysis of Variance 


to the data in Table 14.1. The calculations will be set out ша 
number of steps. 

(14.1) Correction term: 

2 
I "ED d - 569,917.5 
(14.2) Total sum of squares: 
LIX Xie - T = (98.7)? + (109.4)? ++ - - + (107.6)? — 569,917.5 
= 2883.2 


(14.3) Sum of squares for rows (remedial methods) 

il 2 

СОАК 
= qo[ (1267.3)? + (1347.2)? + (1247.2)? + (1368.6)*] — 569,917.5 
= 570,797.6 — 569,917.5 = 880.1 


(14.4) Sum of squares for columns (methods used in school) 


lym, lom 
eed "m 


= 35[ (1348.0)? + (1290.7)? + (1312.7)? + (1278.9)"] — 569,917.5 
= 570,148.1 — 569,917.5 = 230.6 


(14.5) Sum of squares for interaction: 


1 1 1 1 
ayy Ty x T*. m ig ipie Aa 


nre 
= 571,793.1 — 570,797.6 — 570,148.1 + 569,917.5 = 764.9 
(14.6) Sum of oo for variation within subclasses 


DIY Xue 5 22) T's = (98.7)? + (109.4)? +... + (107.0) 
— 4[(308.3)? + (328. 84 - - + (316.8)?] 
= 572,800.7 — 571.793.1 = 1007.6 


As in Chapter 9, the F distribution will be used to test each of the 
three hypotheses. First, each of the three sums of squares must 
be divided by the appropriate number of degrees of freedom to 
produce a mean square. These mean squares are listed in Table 
14.3 where the results of the preceding computations have been 
summarized. The values shown in the column headed F in that 
table were obtained by dividing each of the other mean squares 
by the subclass mean square 31.5. Values of К and F's) read 


Two Bases of Classificaticn - 353 


from the probability tables with the appropriate degrees of 
freedom have been entered in the columns at the right of Table 
14.3. 


TABLE 14.3 Analysis of Variance of Improvement Scores Shown in Table 14.1 


Source of Sum of Degrees of Mean Р Р Р 
variation squares freedom square zm T 


Methods of 
remedial 
instruction 880.1 3 293.4 9.31 2.90 446 
Methods of 
original 
teaching 230.6 3 76.9 244 2.90 4.46 
Interaction 764.9 9 850 270 219 3.01 
Individuals 
within 
subclasses 1007.6 32 31.5 
To test the hypothesis that remedial methods give equal 
results F = 9.31 is compared with F 9) = 4.46. It is evident that 
the null hypothesis can be rejected at the .01 level. (In fact, it 
can even be rejected at the .001 level, for F ss is only 6.94, but 
that value is not recorded in the F tables provided in this text.) 
It must therefore be assumed that remedial methods do differ 
in their effectiveness. 
To test the second hypothesis F = 2.44 may be compared with 
Р = 2.90. It must be concluded that there is no justification 
for asserting that the methods originally used in teaching reading 
differentiate the groups after they have received remedial instruc- 
tion. 
To test the third hypothesis, F = 2.70 may be compared with 
F 95 = 2.19 and F» = 3.01. At the .05 level of significance the null 
hypothesis would be rejected. A very cautious person, preferring 
to work at the .01 level might retain the null hypothesis. That 
hypothesis is that the final results after remedial work can be 
explained by the additive effects of the methods originally used 
in school and the methods used in remedial teaching. As most 
people will probably use the .05 level of significance, they will 
conclude that some combinations of the two methods produce 
better results than others. 
Having discovered a significant interaction, the research 
worker naturally wants to discover its source. If he is an experi- 


354 - Analysis of Variance 


enced research worker he will have foreseen this possibility and 
will have tried to think ahead of time of meaningful comparisons 
which might be made among groups of subclasses. 'The person 
planning this experiment would know something about learning 
theory, would probably therefore decide that if interaction should 
prove significant he will make a comparison of the gain of the 
12 pupils for whom the remedial method was the same as the 
original method and the gain of the 36 pupils for whom the method 
was changed. Even if he did not think of this idea before he saw 
the data, he might notice — while poring over Table 14.1 — that 
in each column the mean in the diagonal cell, that is the mean 
under repetition of method, is smaller than the grand mean for the 
column. If the explanation concerning the source of the inter- 
action occurred to him before he saw the data, he could test the 
null hypothesis in standard manner. If the explanation was 
suggested by inspection of the data he would probably present it as 
intelligent speculation not verified by a statistical test. The good 
research worker habitually speculates about matters which go 
beyond his data, but he is at great pains to make clear when he is 
speculating and when he is reporting. 

4. The hypothesis that the mean of the 4 diagonal cells is equal 
to the mean of the 12 non-diagonal cells. The data in Table 14.1 
show the following: 


Mean of 12 scores in diagonal cells = 105.5 
Mean of 36 scores in non-diagonal cells = 110.1 


These means suggest that greater gain is achieved when the 
remedial method differs from the original method of instruction 
than when the two methods are the same. 

To test the significance of this difference between the means 
we can apply *Student's ratio” as in Chapter 7. The standard 
error of the difference is 


ovate 


As an estimate of е we use the error mean square from Table 14.3, 
so that 


8 = У31.5 
Then ‘‘Student’s ratio" is 
110.1 — 105.5 


e = 2.45 


V31.5V a5 + ds 


Subdivision of Sum of Squares - 355 


The chance is less than .02 that a difference as great as the one 
indicated shall oceur by chance as a result of random sampling 
from populations in whieh the means are equal. The difference 
may, therefore, be regarded as significant from the point of view 
of random sampling. 

Subdivision of Sum of Squares. The comparison just com- 
pleted can be made in another way which leads to the conclusion 
reached above and which provides additional information. 

Call Т, the sum of the scores in the diagonal cells and T; the 
sum in the remaining cells. The comparison can be made by 
using T; and Т» directly if the difference in the number of cases 
is taken into account. Since T, is based on 12 cases and Т» on 
36, the comparison of totals can be made if we equalize the number 
of cases by writing 3T; — Т» 

We can now form the sum of squares due to this comparison 
by the expression 

(3T, – T2)? 
12(3)? + 36(-1) 
Since T, = 1266.3 and T; = 3964.0 this sum of squares is 189.3. 

If this sum of squares is subtracted from the total sum of 
squares for interaction the difference provides an independent 
component of sum of squares with value 764.9 — 189.3 = 575.6. 
Each of these sums of squares ean be tested for significance by 
using as error variance the mean square for variation among 
individuals within subclasses which appears in Table 14.3. The 
analysis of variance appears in Table 14.4. 


TABLE 14.4 Subdivision of Interaction in Experiment on Reading 


Source of Sum of àf. Mean F Pu 
variation squares square 
Between diagonal cells 
and all others 189.3 1 189.3 6.01 445 
Other sources of varia- 
tion among cells 575.6 8 72.0 2.29 2.25 
Individuals within cells 1007.6 32 31.5 


В wine p a и с=ш= 
Both F ratios are significant at the 5% level. It appears that 
there are significant elements of interaction other than the differ- 
ence between diagonal and non-diagonal sums. 
It is interesting to compare the result of this section with that 
of the previous section. In the previous section a ratio of 2.45 


356. Analysis of Variance 


was found, but (2.45)? 6.00 and this differs from the F ratio | 
of 6.01 only because of rounding errors. It is evident that the | 
computing procedures of the two sections lead to the same con- 
clusion, but the procedure of this section provides a means of 
subdividing a sum of squares. Such subdivision will be discussed 
more fully in the next section. 

Orthogonal Comparisons. We have just seen that a combina- 
tion of totals can be made up which reflects some interesting aspect 
of a statistical problem. The combination which was considered 
provided a comparison of elements of the problem, and was 
distributed with one degree of freedom. Theoretically every 
degree of freedom in a statistical problem can be used to provide 
information of some statistical interest. From this point of view 
each of the nine degrees of freedom for interaction in the reading 
problem should provide information of interest. It is usually 
difficult, however, to determine an element of interest that can 
be drawn from each of the degrees of freedom in a statistical 
problem. 

In the following paragraphs we shall develop procedures for 
subdivision of a sum of squares into components each of which 
has one degree of freedom. The general formulas will be applied 
to a subdivision of the sum of squares of remedial reading methods. 

Some definitions will be needed. Suppose that there are Ё 
totals Т, Ть,... T, based on Ni, N2,..., Nx cases respectively. 
Then a combination 


LT, +T + +The 
is called a comparison if 
(14.7) Nil + Nok + BN = 0 
As an illustration recall that in the previous section we had 
k=2, h = 8, L = –1, М = 12, М, = 36, 


so that АМ + Nok = 0 
The sum of squares due to this comparison is 
(14.8) (lh Ti + bT2+++++4,7;)? 


Nil? + Nol? + +++ + Nel? 

The sums of squares due to a set of comparisons will be inde- 
pendent if the comparisons are orthogonal. Two comparisons 
among the same totals 

Т + ЪТ, + ORD a Т, 
and ma T, + maTs +++ ++ ma T. 


q 


Orthogonal Comparisons · 357 


are orthogonal if 


(14.9) Мут + Мат» + +++ + Мт = 0 
Tf the number of cases is the same for all totals, that is 
М, = №=...= №, 


then a combination is a comparison if 
h+h+:::+h=0 
and two comparisons are orthogonal if 
lm + lyms + -+ ть = 0 

If as many comparisons are formed as there are degrees of 
freedom then the sums of squares of a set of orthogonal compari- 
sons constitute a complete subdivision of the total sum of squares. 
It should be noted that orthogonal sets of comparisons can be 
made up in an endless number of ways. 

Let us apply the formulas to obtaining a set of orthogonal 
comparisons among the methods of remedial reading. From 


Table 14.1 we have 


Method of Instruction Total 
Alphabetic Т, = 1267.3 
Kinesthetic T, = 1847.2 
Phonic Тз = 1247.2 
Visual T, = 1308.6 


The following comparisons suggest themselves as being of 


interest: 
The alphabetic and phonic methods jointly in contrast with 


kinesthetic and visual jointly. This comparison has the form 
Ti+ Ts- T- Ti 
The alphabetic in contrast with the phonic method, with the 


form Т; — T; 
The kinesthetic in contrast with the visual T; — Та 
The multipliers of Tu, T», Ts, T, which describe these compari- 


sons are then as follows: 


Ti Ts Ts Т, 
1 -1 1 -1 
1 0 -1 0 
0 1 0 -1 


For any row the sum of the multipliers is zero. For any two rows 
the sum of produets of corresponding multipliers is zero. Hence 
the conditions for orthogonality are satisfied. 


358 - Analysis of Variance 


The sums of squares for the three comparisons are 
(Т. + Т, – Т. – Ty 


оета 755020 
(Poe) 

qo po 16.83 
(Qt 

Поа 19.08 


Total 880.11 


The reader should compare 880.11 which is the total of the 
three sums of squares with the entry 880.1 which is the sum of 
squares for methods of remedial instruction in Table 14.3. The 
two are the same except for rounding errors. 

The sum of squares for each of the three comparisons has one 
degree of freedom and can be tested for significance using the 
sum of squares of individuals within subclasses from Table 14.3 as 
the error sum of squares. The first comparison is highly sig- 
nificant, for P = 4525-26 and Ри = 7.50. Neither of the 
other comparisons is significant. 

The Estimation of Error. The denominator used in setting up 
the F ratio is called the “error mean square,” the “error variance," 
or briefly the “error.” The problem of choosing the appropriate 
sum of squares for error requires careful consideration. The 
choice differs from situation to situation. 

Mathematically the numerator mean square and denominator 
mean square in Ё must be estimates of the same population 
variance in order that the F distribution shall apply. Let us see 
how this consideration applies to the problem of remedial reading. 
In this problem the entire population of children sampled in the 
experiment may be considered as stratified into 16 cells or sub- 
classes. Under each of the three null hypotheses considered above 
the means which appear in the numerator mean square are affected 
only by variations among individuals within the subclasses, Con- 
sequently if the null hypothesis is true each numerator mean 
square is an estimate of the same population variance as is esti- 
mated by the variation within subclasses. 

The situation was different in the problem of Chapter 9 when 
the performance of individuals at archery when shooting at the 
50-yard range in first, second and third order was considered. 
There each individual was measured on each of the three orders. 


The Estimation of Error - 359 


For eleven individuals there were 33 cells or subclasses with one 
соге ineach. If that one score were really the mean of all possible 
scores made by the given individual shooting at the 50-yard range 
in the given order, then the residual variance of Table 9.9 would 
be an interaction variance. Here the interaction would be between 
individual and order and would be due to the differential response 
of individuals to such factors as warming up, practice, and fatigue. 

However the one score shown in a cell is not the mean but is 
merely one score out of an infinite population of scores correspond- 
ing to that cell. As representative of the cell it involves an error 
of measurement, and thus variation within cells is also present 
in the residual variance of Table 9.9. 

When there is only one score in a cell, the only error variance 
available is the residual variance which includes both the variance 
among subclass means and the variance among individuals within 
subelasses. These are confounded and cannot be disentangled. 
There is no possibility of testing interaction against variance 
within cells. 

Now let us consider the numerator mean square, that is the 
mean square among means of orders. These means are subject 
to sampling variation of two kinds. If new individuals are ob- 
served, the total level of performance of those individuals (their 
row means) will not affect the comparison among the order means 
but their differential response to order may. 

As new individuals are observed, new subclasses arise and if 
there is interaction among the subclass means such interaction 
is properly a part of the error variance for testing order difference. 
The variation among the order means is also affected by the var- 
iation within subelasses, for each score involves à measurement 
error. Thus the numerator mean square among order means (that 
is column means in Table 9.9) is an estimate of the same popula- 
tion variance as is the residual variance which represents the 
diserepance of the subclass observations from the estimate pro- 
vided by row and column means, this residual variance being à 
value which includes both interaction and variation within sub- 
classes. When there is only one case in each subclass the dis- 
crepance is the only measure of error available and no problem 
of choice arises. 

To illustrate a situation in which a choice of error variance 
must be made we may consider a problem strueturally similar to 
that of the 11 archers except that several scores are available in 


360 - Analysis of Variance 


each subclass. Four methods of instruction are carried on in 
each of five schools, with several persons in each school taught by 
the same method. Here there are four times five or twenty sub- 
classes. However, these subclasses are not a stratification of the 
population since the schools are themselves a sample out of all 
possible schools just as the 11 archers were a sample out of all 
possible archers. The variation among means of methods is due 
in part to variation within each of the observed twenty subclasses 
but also in part to variation of subclass means from the values 
they would have if they were completely dependent upon their 
row and column means. The latter variation is measured by the 
interaction mean square. 

Hence in a situation like the one just described a test is first 
made for significance of interaction by using the variation within 
subclasses as error. If the interaction mean square is significantly 
greater than the mean square within subclasses that interaction 
mean square is used as the error variance for testing significance 
of row and column effects. If the interaction mean square is not 
significantly greater than the variance within subclasses then this 
interaction may be considered an estimate of the same population 
variance as the mean square within subclasses. In such case the 
sum of squares for interaction and for variation within subclasses 
can be advantageously combined to form a new error variance for 
testing row and column effects. These procedures will be dis- 
cussed. 

Mathematical Models for the Two-way Layout. The discussion 
of the previous section will now be amplified by a symbolic analysis 
based on mathematical models used in analysis of variance. 
The mathematical model which is used to describe the population 
plays an important part in determining how the analysis of 
variance is to be carried out. 

The effect of the mathematical model on the analysis of vari- 
ance is due to the logic of the F distribution. It has been stated 
previously that the mean squares in the numerator of Ё and in 
its denominator must be estimates of the same variance in order 
that the F test may be made. For accurate use of the F test it is 
necessary to determine in each case whether this condition is 
satisfied. 

Consider first the mathematical model used in the experiment 
on remedial reading. In this model the mean of each cell is 
assumed to be the constant pij. 


Mathematical Models for the Two-way Layout · 361 


TABLE 14.5 Population Values Estimated by Mean Squares in an Experiment 
with г Rows, с Columns and п Individuals in each Cell 


Source of Mean aguas Population 
variation q value estimated 
n 
Rows Е m i Zz(X,-Xy о + Y Z(ui.— p) 
Columns = m i У(Х; – Xy о + т Z(u.; и 
; nua- X -X Х)) Пим MG +) 

Interaction (== e+ (C=) 
Individuals 

within sub- 

classes 22200. Aur Хо! а? 

rc(n — 1) 


Sd CU Mie аа 


Population values estimated in Table 14.5 are determined only 
on this assumption. Each of the hypotheses H,, H. and H, 
modifies these values so that if the hypothesis is true the estimated 
value becomes g? in each case. Consider for example the hypoth- 
esis H, which states that pa. = M2. * ** = Mi. = H- If this hypothesis 
is true each difference mi.— M is zero. Hence the expression 
ee is zero, and the estimated population value for rows 
is o2. It follows that under the appropriate hypothesis each F 
ratio formed in Table 14.3 satisfies the condition that the numer- 
ator and denominator are estimates of the same variance. 

Consider now the model for an experiment in which four 
remedial reading methods are tried in each of several school 
systems, the school systems being chosen at random from a popu- 
lation of school systems. Suppose that the teaching methods are 
set up in rows and the school systems in columns. The mean 
of a row can be regarded as the same for all individuals in the 
row but the mean of a column is a variable which depends on the 
particular school system chosen for that column. The mean of 
a cell must take into account these components and their inter- 


action. 
The discussion will be clarified if the following symbols are 


introduced. 
из. is the mean of a method of instruction. This is the same 


(constant) for all samples. The mean of the pi. iS и. EN 
a, ig а component due to the school system. This will vary 


362 - Analysis of Variance 


as different school systems are introduced. We shall consider 
it a normally distributed variable with mean zero and variance о". 

bi; is а component due to the interaction of method of instruc- 
tion and school system. As different school systems are chosen, 
b; will vary. We shall consider бу; as distributed normally and 
independently of а; with mean zero and variance сь. 

With this symbolism the mean of a cell in the ith row and 
jth column is 

Bi. tay Ы. 

In Table 14.6 are shown sources of variation and related estimates 
of population values for the statistics in Table 14.2 under the 
model just described. 


TABLE 14.6 Population Values Estimated in a Two-way 
Layout When the Column Means are Variables 


Source of Population value estimated 
variation by mean square 
cn 
Rows o + по? + AT Zui. — в 
Columns в? + noy + nro 
Interaction а? + nay) 
Individuals 
within subclasses а? 


Table 14.6 indicates that the Р ratio cannot always have the 
mean square within subclasses as a denominator. Under the 
hypothesis that и;. = и for all row means, the estimated mean 
square for rows is c? + пор while the mean square within sub- 
classes is 07. 

'The tests of significance are carried out as follows. First, 
test for interaction using the mean square for interaction as 
numerator and the mean square within subclasses as denominator. 
Under the hypothesis that interaction is zero (s;?— 0) both 
mean squares are estimates of о?. 

The tests for rows and columns depend on the outcome of the 
test for interaction. If the hypothesis that c; = 0 is rejected, 
then under the hypothesis that и; = и, the estimated mean square 
for rows is the same as for interaction. For columns, the hypoth- 
esis that there is no column effect is given by с, = 0. Under this 
hypothesis, too, the F test is made by using the mean square for 
interaction in the denominator. 


Replication on Each Individual - 363 


If the hypothesis that interaction is zero is accepted then either 
the mean square for interaction or the mean square within sub- 
classes may be used. However, it is preferable to combine the 
sum of squares for interaction with the sum of squares within 
subclasses, and similarly to combine the degrees of freedom. Then 
from this combination a mean square is found to use in the denomi- 
nator as estimate of the error variance. In this way more degrees 
of freedom are available in the denominator. 

The experiment described under the second model is often 
referred to as a randomized block experiment, in analogy with 
agricultural experiments in which the same experiment is repeated 
on several blocks of land. Here the block is the school system. 
After the groups have been selected within each block, the methods 
of instruction are assigned randomly to the groups. 

Experimental Procedure Replicated on Each of Several Indi- 
viduals. In some experiments it is possible to carry out the entire 
experimental procedure on each individual in the study. А great 
advantage of carrying out an experiment in this way, when 
conditions permit it, is that differences between persons are 
eliminated from the comparisons among elements of the experi- 
ment. Another advantage is that fewer cases are needed, because 
several measures are obtained from each case. When several 
measures are obtained from the same individual, these measures 
are correlated and the calculations must take the correlations 
into account. Calculation appropriate for this kind of problem 
will be considered. 

As an example of an experiment replicated on each of several 
individuals, we shall consider once more the experiment on 
archery discussed in Chapter 9. Each subject in that study shot 
daily for six days at ranges of 50, 40 and 30 yards. The order of 
shooting at these ranges varied from day to day so that each 
subject shot each range twice in the first position, twice in second, 
and twice in third. In Chapter 9 only part of these data were 
utilized. In Table 14.7 are given all the scores for the entire 
period for each of 11 subjects classified by range and by order of 
shooting that range. Each entry in the table is the sum of two 
scores. Thus, if on one day the ranges were shot 50-30-40 and 
on another day were shot 40-30-50, the two scores at range 30 would 
be combined to make the score for range 30 in the second order. 

Note that each score in the Table 14.7 can be classified in 
three ways as belonging to a range, an order and a subject. It 


364 - Analysis of Variance 


TABLE 14.7 Scores of Eleven Archers Shooting at Three Ranges 
and in Three Orders at each Range* 


Score at given range and ordert 


Archer 50 yards 40 yards a 30 yards Total 
ji 2 3 1 2 3 1 2 3 

A 114 182 123 248 241 15 305 255 286 1911 
В 99 121 119 101 160 153 203 188 225 1369 
С 51 100 35 153 149 148 201 184 238 1259 
D T8.» 144 +12 114 142 132 163 167 219 1231 
Е 134 125 201 246 214 237 267 182 259 1865 
Е 71 38 49 105 117 84 146 120 157 887 
а 66 67 107 122 125 133 195 235 181 1231 
H з 89 "5I 141 164 113 162 166 213 1132 
І 146 159 157 181 222 187 275 240 227 1794 
Ј 80 101 83 109 120 106 170 110 126 1005 
K 105 113 113 186 185 188 232 166 221 1509 

Total 977 1239 1110 | 1706 1839 1638 | 2319 2013 2352 15,193 


* Data from Schroeder ". 

t The number 1 indicates that the given range was the first of the three ranges 
to be shot, and the numbers 2 and 3 are to have similar interpretations. 
is convenient to think of each score as belonging to a cell in a 
rectangular solid, one score to a cell. The solid may be pictured 
asin Figure 14-1. Each of the 99 scores is represented as located 


Fig. 14-1. Three dimensional solid representing scores 
made by 11 archers shooting in 3 orders at 3 ranges. 
in one of the 99 cells of which this solid is composed. The 9 cells 
containing the scores of the sixth archer have been indicated by 
a cross section in which a shaded portion represents the cell con- 


taining the score made by archer 6 shooting at 50-yard range in 
first order. 


Replication on Each Individual · 365 


In the same manner in which marginal totals are obtained in 
a flat or two-way layout, marginal totals are obtained for this 
three-way layout. The marginal totals are represented as two- 
way layouts, one set of marginal totals for each pair of variables. 
The sets of marginal totals will be presented in Tables 14.8, 14.9, 
and 14.10. 


TABLE 14,8 Sum of Scores for All 11 Archers 
Classified by Order and Range 


Sum of scores at given range 


Order Total 
50 yards 40 yards 30 yards 

1 977 1706 2319 5002 

2 1239 1839 2013 5091 

3 1110 1638 2352 5100 


Total 3326 5183 6684 15,193 
„—.—————.—- _— 


The reader may check that the nine entries giving sums for all 
subjects in Table 14.8 are the 9 entries in the lowest line of Table 
14.7. He may verify the entries in Tables 14.9 and 14.10 by adding 
appropriate entries in Table 14.7. For example the first entry 
in Table 14.9 is 419 and equals 114 + 182 + 123 in 14.7. 

From the data in Table 14.7 and its subtables the various 
component sums of squares can be calculated. There are seven 
of these components. Three are the main effects, the effects 


TABLE 14.9 Sum of Scores for All 3 Orders 
Classified by Archer and Range 


Sum of scores at given range 


Total 
а 50 yards 40 yards 30 yards 

А 419 646 846 1911 
B 339 414 616 1369 
с 186 450 623 1259 
р 294 388 549 1231 
Е 460 697 708 1865 
Е 158 306 423 887 
а 240 380 611 1281 
н 178 418 541 1182 
1 462 590 742 1794 
J 264 335 406 1005 
к 331 559 619 1509 


Е В ЕЕ АЕ В ee 
Total 3326 5183 6684 15,193 


366 - Analysis of Variance 


TABLE 14.10 Sum of Scores for All 3 Ranges 
Classified by Archer and Order 


Sum of scores at given order 


Archer Total 
1 2 3 
A 667 678 566 1911 
B 403 469 497 1369 
С 405 438 421 1259 
р 355 458 423 1231 
E 647 521 697 1865 
Е 322 275 290 887 
а 383 427 421 1231 
H 336 419 377 1132 
I 602 621 571 1794 
J 359 331 315 1005 
K 523 464 522 1509 
Total 5002 5091 5100 15,193 


of range, order and subject. Three more are the interactions 
between pairs of factors, called first order interactions. The 
remaining one is the joint interaction between all three variables, 
called the second order interaction. 

As usual the calculations begin with the correction term which 


(15,193)? _ 
ет 2,331,588 


Then the sums of squares for main effects are as follows: 


Archers: 

$911)? + (1369)? + - - - + (1005)? + (1509)?]— 2,331,588 = 134,395 
Range: 35[(3326)? + (5183)? + (6684)?] — 2,331,588 = 171,491 
Order: 35[(5002)* + (5091)? + (5100)?] — 2,331,588 = 178 


Note that in these calculations the number by which each sum 
of squares is divided is the number of cases which enter into that 
sum. 

The interactions can be computed from Tables 14.8, 14.9, 
and 14.10. 


Range X Order interaction from Table 14.8: 

345 [ (977)? + (1706)? + - - - + (1638)? + (2352)?] 
= g3(38326)? + (5183)? + (6684)?] 
= gs[ (5002)? + (5091)? + (5100)?] + (15,193)?/99 
2,514,453 — 2,503,079 — 2,331,766 + 2,331,588 = 11,196 


Replication on Each Individual · 367 


Range x Archer interaction from Table 14.9: 
4[ (419)? + (646)? +--+ + (559)? + (619)?] 
— gy [ (3326)? + (5183)? + (6684)"] 
— 4[ (1911)? + (1369)? + - - - + (1005)? + (1509)2] + (15,193)?/99 
= 2,656,966 — 2,503,079 — 2,465,983 + 2,331,588 = 19,492 
Order x Archer interaction from Table 14.10: 
4[(667)2 + (678)? + -- - + (464)? + (522) ] 
— 251(5009)° + (5091)? + (5100)*] 
— (1911) + (1369)? +- -- + (1509)?] + (15,193)?/99 
= 2,480,800 — 2,331,766 — 2,465,983 + 2,331,588 = 14,639 
Order x range X Archer interaction: 


Perhaps the simplest way to caleulate the second order inter- 
action is to calculate it as a residual, subtracting from the total 
sum of squares the component sums of squares which have just 
been computed. The total sum of squares is obtained from 
Table 14.7 as the sum of the squares of the 99 scores minus the 
correction term. 

2 
(114) 4- (182)? +--+ + (166)? + (221)? – 01501997 
= 2,703,221 — 2,331,588 = 371,633 
Then the second order interaction sum of squares is 
371,633 — 134,395 — 171,491—178 — 11,196 — 19,492 — 14,039 = 20,242 


The sums of squares together with their degrees of freedom 
and the derived mean squares are presented in Table 14.11. In 
this table are also presented the F ratios for interactions. A 
justification for using these ratios to furnish tests of significance 
is given below. 


TABLE 1411 Analysis of Variance of Data of Table 14.7 
and Test for First Order Interaction 


Source of af. Sum of Mean р Fos Fs 
variation squares square 
Range 2 171,491 85,746 
Order 2 178 89 
Archers 10 134,395 13,440 
Order Х range 4 11,196 2,799 5.53 2.61 3.83 
Order X archer 20 14,039 132 1.45 1.84 2.37 
Range X archer 20 19,492 975 1.93 1.84 2.387 


Second order interaction 40 20,242 506 


368 - Analysis of Variance 


The number of degrees of freedom for each main effect in the 
analysis of variance table is one less than the number of classes for 
that trait. For a first order interaction the number of degrees 
of freedom is the product of the degrees of freedom for the two 
main effects which enter into the interaction. For the second 
order interaction the number of degrees of freedom is the product 
of the degrees of freedom for the three main effects in the inter- 
action. The sum of all degrees of freedom is one less than the 
number of cases in the study. 

Sources of Variation. Before proceeding with tests of signif- 
ісапсе a discussion of the sources of variation which enter into 
the analysis of variance is desirable. The discussion will deal with 
the variables themselves, or the main effects, and with the inter- 
actions among the variables. 

1. The range. This will be considered as having the same 
fixed values, 30, 40 and 50 yards, for all samples. 

2. The order of shooting. This will be considered as having 
the same fixed values 1, 2 and 3, for all samples. 

3. The skill of the individual. The individuals will be con- 
sidered as a sample of all possible similar individuals. Skill, as 
represented by mean score of an individual in repeated trials, 
will vary from individual to individual, and, thus, mean skill 
will vary from sample to sample. To consider the mean skills 
of individuals as fixed from sample to sample would restrict the 
generalization to the individuals in the study and would make the 
study of little interest. Consequently the more general view will 
be adopted here. 

4. Interaction of range with order. Since both range and order 
are fixed for all samples their interaction is also fixed. 

5. Interaction of range with skill of individuals. Since the 
individuals vary from sample to sample, this interaction is а 
variable. 

6. Interaction of order with skill of individuals. For the same 
reason this interaction is also a variable. 

To show how this view of the sources of variation influences 
the analysis of variance we shall introduce symbolism with which 
to describe the mathematical model. 

The following subscripts will be used: 7 indicating any one of 
the r ranges; j indicating any one of the s orders; a indicating 
any one of the ¢ individuals. 

The symbol u;; will be used for the population mean at the 


Sources of Variation - 369 


ith range and jth order. The Greek letter p indicates that this 
mean is the same for all samples and thus is really a character of 
the population from which the samples are drawn. 

To describe the score X ija made by the ath individual shooting 
at range 7 in order j, symbols will be needed for three additional 
variables, and for these the Latin letters, a, b, and с, with appropri- 
ate subscripts will be used. 


Symbol Component Expectation Variance 
аа Extent to which the ath individual 0 с? 
differs from the population mean ш 
da = Ha — M 
bia Extent to which the ath individual 0 oe 


shooting at range 1 differs from what 
is expected in terms of his own mean 
and the range mean 


bi = Hia — Bi — Ha + H 
Extent to which the ath individual 0 с; 
shooting in order j differs from what 
is expected in terms of his own ability 
and the order mean 


Cia = Ша — Hi — Ha + H 


These three components are assumed to vary from individual 
to individual, independently of each other and to be normally 
distributed. Then the score made by the ath individual shooting 
at range ї in order j will differ from 

Hija = Bi + Ча + dia + буа 

only through an error of measurement. All the possible scores 
which can be made by a particular individual shooting at fixed 
range and in fixed order may be considered as a normal population 
with mean ш; and variance о?. Each cell of the table has such 
a normal population, all populations having the same variance 
but quite possibly differing as to mean. In the data under con- 
sideration, however, we have only one observation from any one 
such population. Under the assumption that the second order 
interaction is zero in the population, the second order interaction 
of the sample provides an estimate of o°. 

On the basis of this mathematical model the mean squares in 
Table 14.11 are estimates of the population values named in 
Table 14.12. 


370 - Analysis of Variance 


TABLE 14.12 Population Values Estimated by the Mean Squares 
of Table 14.11 


Source of Population values 
variation estimated 
t 
Range ce? + soe + = (к. — ш)? 
Order c? + ro? + zu Z(u.; — p)? 
Archers о? + soe + ro + тоа? 
2 t NN я 4 | 2 

Range X Order о + (me т) 22064 ш. = Bai p) 
Order X Archer в? + ro? 
Range Х Archer o? + sow 


Order X Range X Archer о? 


Tests of Significance for Interactions. By study of Table 14.12 
the appropriate tests of significance can be ascertained. Consider 
first the tests of significance for interaction. 

1. Interaction of range with order. The hypothesis that this 
interaction is zero is given by the expression 

Mg — Шии 0 
for each range and each order. Under this hypothesis the ratio 
of mean square for range Х order interaction and the mean 
square for the order x range Х archer interaction are independent 
estimates of c?. Consequently the F ratio using these mean 
squares is an appropriate test for the hypothesis that no inter- 
action exists between order and range. 

The entry in Table 14.11 shows that this F has the value 5.53, 
so that the effect of order appears to depend on range. 

2. Interaction of archer with range. The hypothesis that this 
interaction is zero is given by the expression 

oY = 
Hence the mean squares for interaction of range X archer and the 
second order interaction are both estimates of c? and the F test 
applies. The value of F to test this interaction, as found in 
Table 14.11, is 1.93 which is significant at the 5% level and the 
hypothesis of no interaction may be rejected. 

3. Interaction of archer with order. By an argument like the 
preceding the appropriate F test for this hypothesis is the ratio of 
mean square for order x archer to the mean square for the second 
order interaction. The F value is not significant at the 5% level. 


Choice of Error Variance for the Main Effects - 371 


Choice of Error Variance for the Main Effects. This choice 
must be made in the light of a preliminary examination of the 
first order interactions, such as is shown in Table 14.11. When 
the null hypothesis can be accepted for all of the first order 
interactions, then each of the four mean squares for interaction 
is an estimate of 0”. The appropriate procedure then is to pool 
all the four sums of squares for interaction forming a new sum of 
squares for error, to pool the corresponding degrees of freedom, 
and to obtain a new estimate of о? by dividing this sum of squares 
for error by the sum of the degrees of freedom for the various 
interactions. This estimate of the error variance forms the 
denominator in the F ratio used to test each of the main effects. 

In the archery problem, however, the preliminary analysis 
indicates that one first order interaction may be treated as zero 
and the other two may not be. It will therefore be necessary to 
consider separately the estimate of error for each of the main 
effects. 

Since the purpose of the experiment was to determine the 
effect of order only, the tests of range and of archers are of no 
practical import and are introduced here only to illustrate the 
appropriate procedures. 

1. Test that range means are equal. Applying this hypothesis 
to the quantity named in Table 14.12 in the line headed range, we 


У(ш. — p)? = 0, so the mean square for range under the 


st 
have "ES 
null hypothesis for range is an estimate of o? + sc; if archer X range 
interaction exists and is c? if archer X range interaction is zero 
(c? = 0). In Table 14.11 this interaction has been tested and 
found significant. "Therefore the mean square for range and the 
range X archer interaction are both estimates of o? + sc? and 
the latter is the appropriate denominator to use in the F ratio. 
Table 14.13 presents the final analysis for range. 

2. Test that order means are equal. Applying this hypothesis to 
the quantity named in Table 14.12 in the line headed order, we 
rt 
-1 
estimate of 0° + ro? under the null hypothesis for order. Since 
the preliminary analysis of Table 14.11 shows the archer X order 
interaction non-significant, we pool its sum of squares with that 
of the second order interaction and complete the analysis as in 
'Table 14.14. 


Z(u.;— и)? = 0, so the mean square for order is an 


have 
8 


372 - Analysis of Variance 


TABLE 14.13 Analysis of Variance for Testing the Significance 
of Range and of Archer in the Data from Table 14.11 


uae?! S MUERE ПОВЕ UL UE d 
Source of di. Sum of Mean Р Кы Fes 
variation squares square 
Range 2 171,491 85,746 87.944 3.49 5.85 
Archers 10 134,395 13,440 13.785 2.35 3.37 
Error 20 19,492 975 


TABLE 14.14 Analysis of Variance for Testing the Significance 
of Order in the Data from Table 14.11 


SR о ы АШЛЫ ты Ss ee EM EE LLL 
Source of at Sum of Mean Р Pos ШО 


variation АГ. squares square 
Order 2 178 89 -153 3.15 4.98 
Error 60 34,881 581 


3. Test that archer means are equal. This test presents a more 
complex problem since the quantity named on the line headed 
archer in Table 14.12 contains two first order interactions. For 
our data the preliminary analysis has indicated that we may treat 
c as zero but may not so treat о». Under the null hypothesis 
for archers, o.2=0. Hence under that hypothesis the mean 
square for archers is an estimate of 0° + sc; As this same 
quantity is also estimated by the archer x range interaction, the 
latter is the appropriate estimate of error, and the analysis proceeds 
as in Table 14.13. 

If both of the first order interactions with archers had been 
found significant, we could not have treated either сь? or ог? as 
zero. Then no single interaction would provide a proper test for 
the difference among archers, which is the test that ое? = 0. 
The mean squares must now be combined to form a proper estimate 
of error, and we shall use the following key: 


Symbol Mean square 
ko Archers 
ky Archer Х order interaction 
k Archer x range interaction 
Ез Archer х order Х range interaction 
Then 
(14.10) F ko 


T ki + he = ks 


Latin Square and Graeco-Latin Square · 373 


with (t — 1) degrees of freedom for the numerator and 
(kı + № — ks)? 
ky? n ky? А ks? 
65-00-02 (-Dc-0) @-VYE-HE-Y 


(14.11) n= 


degrees of freedom for the denominator. (The reader may wish to 
be reminded that t has been used to indicate the number of archers, 
r the number of ranges and s the number of orders.) 

Latin Square and Graeco-Latin Square. In problems which 
involve classification into several categories on each of several 
variables, it is often desirable to take observations for all combina- 
tions of the values of the different variables. By arranging the 
observations in one of the forms known as the Latin Square or the 
Graeco-Latin Square a large amount of information can be 
extracted from a relatively small number of observations. In the 
analysis of the data so obtained it is necessary to make the assump- 
tion that interactions are zero, and the investigator should recog- 
nize that he is making this assumption. 

Suppose, for example, that the director of a statistical labo- 
ratory is about to buy new machines and has several models under 
consideration. He wants to study the speed with which a skilled 
computer can complete a given amount of work on each machine. 
Obviously, if one person should perform the same computations on 
each of several machines there would probably be a practice effect. 
Consequently he must perform different computations, and there 
is arisk that those computations may not be equally difficult, The 
solution therefore requires that there be as many computers as 
machines and as many problems as machines, and that each 
computer use each machine once and solve each problem once, 
and that each problem be computed once on each machine. In 
the following scheme, each row represents a machine, each column 
represents a problem, and the letters A, B, C, D and E represent 
computers. 


Problem 
1 2 3 4 5 
Machine I A B С р Е 
Machine IT B C D E A 
Machine III CoD AA B 
Machine IV О ЕЕВС 
Machine V ® KAn ОВО: Go nD 


Latin Square 


374. Analysis of Variance 


The 24 degrees of freedom among the 25 observations may be 
assigned as follows: 


Machines 4 
Problems 4 
Computers 4 
Residual error 12 

Total 24 


Now suppose it appears that the speed of a computer may be 
affected by the order in which he uses the machines. Then the 
experimental design must provide that each machine shall be 
used first by one computer, second by another, third by another, 
fourth by another, and last by the remaining computer. И now 
the small Greek letters, o (alpha), В (beta), y (gamma), à (delta), 
and e (epsilon), be used to represent the order in which a machine 
is used, the design may be called a Graeco-Latin Square. (The 
same name, Graeco-Latin, is eustomarily applied even if the 
second factor is indicated by numerals or by small Latin letters 
instead of Greek letters.) 


The 24 degrees of 


freedom now are as- 1 2 3 4 5 

signed as follows: Ome oy Co DA о 
Machines 4 Mm | Bp бо D o Boy TAE 
Problems 4 HE Ол Die EB AS Ва 
Computers 4 IV|D3 Eo Ах Be СВ 
Orders 4 Ее АВ 805 ба TuDoy 
Residual error 8 Graeco-Latin Square 

Total 24 


There are 4 variables (machines, operators, problems, orders). 
The data are classified into 5 categories on each of these variables. 
In the 25 observations recorded every category of one variable 
occurs once and only once in combination with every category 
of every other variable. Thus each row of the pattern displays 
every category of every variable except the row variable, and each 
column displays every category of every variable except the 
column variable. 

Application of Graeco-Latin Square to Study on Memorizing 
Music. In the standard Latin Square or Graeco-Latin Square, 
there is one observation in each cell. We now shall consider an 
arrangement of that sort obtained by Rubin-Rabson 1° from an ex- 


Application of Graeco-Latin Square - 375 


periment in the memorizing of piano musie and subsequently 
shall consider an adaptation of the arrangement in which there 
are several scores in each cell. Each subject came to the studio 
on four successive days. On each day he memorized one composi- 
tion and exactly three weeks later he returned to the studio and 
relearned the same composition. The time in minutes required 
for the first learning of a composition, the time for its relearning 
and the reduction in that time (that is, the amount by which the 
time for the first learning exceeded the time for relearning) are 
the basic data of the study. Each of the four compositions was 
studied by a different method, and the comparison of the four 
methods is the chief goal of the study. The effect upon method 
difference caused by difference in difficulty of compositions and 
in order of presentation of the compositions has been controlled 
by the experimental design. Therefore, the variation in composi- 
tions and in order of presentation must not be allowed to con- 
tribute to the error variance in the statistical analysis. 

The four methods to be compared will now be briefly described. 
The reader who is interested in more exact details should consult 
the original study. 


A: Seated at a table the subject studied the musical score and its 
analysis for 20 minutes. Then he carried the score to the piano and 
practiced until he had memorized it. When he could play it through 
twice without errors and without stopping he was admonished not to 
practice it further and to think of it as little as possible. 

B: This was the same as A except while the subject studied the score 
he was told to write down any relationship concerning voice movement, 
chords, etc., which he thought might help him in memorizing the compo- 
sition, and was allowed 25 minutes for the pre-study. 

C: There was no preliminary study period. Subject went directly 
to the piano but was warned to play the composition for the first time 
at a speed slow enough to avoid errors. 

D: Subject was seated before a phonograph and given the musical 
score. He listened to four successive repetitions of the composition while 
he followed the score. 


In each method, time was clocked from the beginning of the first 
playing of the composition until it was played through twice 
without errors. Three weeks later the subject returned to relearn 
the compositions in the same order as before but without any 
preliminary study. 

Thirty-six subjects were tested in 4 groups according to the 


376 - Analysis of Variance 


following scheme, where P, Q, В and S represent the 4 composi- 
tions and A, B, C and D the 4 methods: 


First Second Third Fourth 
dias day day day day 
1 AP BQ CR DS 
2 DR CS BP AQ 
3 BS AR DQ ОР 
4 CQ DP AS BR 


From the description of the original study it is not clear whether 
the four groups should be considered as a classification variable 
or not. If individuals were not assigned to the groups at random, 
but groups were selected on the basis of some principle of con- 
venience or existing class structure, then the variation from group 
to group might be large in relation to variation among individuals 
within groups and also large in relation to residual error. In that 
case the variation among group means should be taken out as a 
principal component. On the other hand, if individuals were 
assigned to groups by some purely random procedure the variation 
among groups is due to sampling error and should be a part of the 
error variance. It is true that no two groups were exposed to 
the same combination of method and composition on the same 
days, but that is of no importance if the assumption of zero 
interaction is justified. The interaction cannot be tested in this 
arrangement but is explicitly assumed to be zero. Inasmuch as 
the printed study does not state how persons were assigned to 
groups, it is not possible to be sure how groups should be treated 
in the analysis. 

As a first analysis we shall ignore the variation among persons 
and shall consider as basic data the sum, for the nine individuals 
in one group, of the number of minutes by which the time for the 

. original learning of a composition exceeded the time for relearning 
it. These data are presented in Table 14.15. 

There are 16 observations yielding 15 degrees of freedom. If 
we consider group as a variable there are 4 variables — method, 
composition, day and group — each of which has 3 degrees of 
freedom. ‘There remain 15 —(3--34-34-3) = 3 degrees of 
freedom for error. Each F ratio obtained will thus have 3 degrees 
of freedom for the numerator variance and 3 for the denominator 
variance, and so must exceed Fs = 9.28 to be significant at the 
.05 level. It is evident that this design cannot detect small differ- 
ences among the methods. 


Application of Graeco-Latin Square · 377 


TABLE 14.15 Reduction in Learning Time in Minutes for Compositions P, Q, 
R, and S When Studied by Methods А, В, С and D. (Each entry is the sum 
for 9 subjects. Numerals 1, 2, 3, and 4 indicate group.) 


a Еа 
Method First day Second day Third day Fourth day Total 
A PL- 80 R3 28 S4 59 Q2 40 207 
B 88 128 91 705 P2 96 ВА 415 336 
с 94 50 52 57 RY 18 P3 62 177 
D R2 50.5 P4 94 Q3 92 S1 33.5 270 
Total 308.5 249.5 255.0 177.0 


The computation of the various sums of squares is shown in 
Table 14.16 and the resultant analysis of variance in Table 14.17. 

Latin-Square Pattern with Several Observations in Each Cell. 
The previous analysis of the time required for memorizing piano 
music under different methods of study did not make use of all 
the data available. The raw data of the study which may be seen 
in Table 14.18, indicate that there were 9 observations in each 


TABLE 14.16 Computation of the Relevant Sums of Squares 
for the Data of Table 14.15 


Methods Compositions Days Groups 

A 207 Ry 1882 1 3085 1, 1402 

B 336 Q 2525 2 249.5 II 2435 

©. 177 R 128 3 255 II 310 

D 270 8 отто, Ar qnin IV 245 
4 4 
ух 990 990 990 990 
Ст 

4 
z(y xg 259974 267370.5 253776.5 252036.5 

1 

4 
42 (Ух) 64993.5 66842.6 63444.125 63009.125 
990/16 61256.25 61256.25 61256.25 61256.25 
Sum of 
squares 3737.25 5586.375 2187.875 1752.875 

4 4 
>>. 
Total sum of squares: » Ў, и bam ха 
; 990: 


= (80 + 128? + ++ + 33.5) — T6 


= 75187 — 980100 — 75187 — 61256.25 = 13930.75 


13930.75 — (3737.25 + 5586.375 + 2187.875 + 1759.875) = 666.375 


378 - Analysis of Variance 


TABLE 14.17 Analysis of Variance of Reduction in Learning Time for 4 
Compositions Studied by 4 Groups Using 4 Methods on 4 Days 


——————-——-————— 

Source of variation Sum of squares d.f. Mean square F F 95 
Total 13930.75 
Methods 3737.25 3 1245.75 5.61 9.28 
Compositions 5586.375 3 1862.125 8.38 9.28 
Days 2187.875 3 729.29 3.28 9.28 
Groups 1752.875 3 584.29 2.63 9.28 
Error 666.375 3 222.125 


Purus Е eS cg т Иык: ы а чы ш 
group and that the variability among these was disregarded in 
the analysis of Table 14.17. As in the previous analysis, we shall 
work only with the difference in time required for initial learning 
and for relearning. There are 4 scores for each of 36 students 
or 144 scores in all. 


36 4 
For all 144 scores we have » » X? = 15920.5 
5 dat 


4 
and УХ = 990 
1 


-Mg 


Then the total sum of squares is 
15920.5 — (990)?/144 = 15920.5 — 6806.25 = 9114.25 


To find the sum of squares among the 36 individuals we first 
compute 
36 


4 
» (Хх) = 33? + 55.5? + (—13)? +... + 33? + 12.5? = 41962 
Д 
Then the sum of squares is 
36 4 x 36 4 
4) (YX) - talà AX) = 10490.5 — 6806.25 = 3684.25 
To find the sum of squares among groups we first compute 


4 94 
» (хх) = 192? + 243.5? + 310? + 244.5? = 252036.5 


Then the sum of squares is 


it 990° 
36 (252036.5) — И = 194.76 
and the mean square is Edo = 64.92 


Basic Data on Memorizing Piano Music - 379 


suosqey-urqny за Aq papraosd єз + 


08 ung| eer сос 13 96 o umg 
ПА РЕ $ и 98 91 єт 1 es je SI 
СТА “б $ сЕ ge ey р сет от LI 
3'05 д E $ те 986 Ey Di от 95 91 
све F g 8 el Ee е e — mes e 0 т 
с cel 0 с 6 ze 9g А g= 9 6 а! 
esp сәт л 9 6 1g oF 0 8— 98 0 £I 
а гат № ст Y og €'0€ от 9T взт p ZI 
3978 3'8 и 8 L 65 egy 0c or 9 ст тт 
Y L l 0 Gm 8с 98 S £I [4! 9 OI 
0те c6 29 801 86 ung 261 [3:3 8 902 08 umg 
3'8 1 3 б 0 Ha rae = = t St 6 
8 E Z 99 0 9% 0 ? I We те 8 
€'86 ub. y sor 9 $5 5 еб It еб ra p 
ez 6r 9 T9 № EY TG €'9c T [4 €9 т 9 
8 mE m 8 Z 85 стс 5 T= 60 її $ 
сте П “f=. ИВ COT сс ст 9 $ 8 s Y 
ес 58 er $66 Т 1% = Ora 9:— 9 = $ 
61 а= St EE a= 05 9:99 S ps ет $8 5 
98 e ПИТ 9r L 61 Ga 8 I [4! e I 
ют а o я M PL a 9) a Y 
3oefqng 3oo[qng 


вәупитүү ш OUT], 2uruieoT ш 90uo19gr(T 


SONU], ш oum Зшилвә” ur svu yıq 


spiny ошод Burzuouew uo oppoq 21594 ВГУ 31872 


380 - Analysis of Variance 


It will be observed that this is exactly $ of the value previously 

found, because now we are treating the group mean as the mean 

of 36 observations and previously we treated it as the mean of 4. 
The sum of squares of individuals within groups is 


3684.25 — 194.76 — 3489.40 


3489.49 
32 
freedom may be arrived at by noting that from the 35 degrees of 
freedom among 36 individuals there must be subtracted the 
3 degrees of freedom among the 4 groups, or by noting that there 
are 8 degrees of freedom for the 9 individuals in each of the 


and the mean square is = 109.05. The 32 degrees of 


4 groups. 
The test of the differences among group means is therefore 
64.92 € р 
Ё = 109.08 = .6. The group means are not significantly different 


in respect to the variation among individuals. 

In the analysis in Table 14.17, if group differences are not iso- 
lated they will form part of the estimate of error. In Table 14.19, 
if group differences are not isolated they will form part of the 
estimate of variation among individuals, which has here been 
separated from the error variance. Therefore, in the analysis 
of Table 14.17 if group differences are large it might be important 
to isolate them, while in the analysis of Table 14.19, no matter 
how large the differences among groups they cannot affect the 
error variance. Since we are not particularly interested in obtain- 
ing a correct estimate of the variation among individuals, there 
is no particular reason for investigating group differences. 

In passing it should be noted that it was necessary to have the 
same number of individuals in each cell, otherwise no solution 
could have been reached. 

The sum of squares among methods is 


207? + 336? + 177° + 270? 990? 
36 144 


which is exactly $ of the value previously found. In the same 
manner, the sum of squares among compositions and among days 
will be exactly $ of their former value. d - 

The sum of squares among compositions is 


332? + 252.5? + 128? + 277.52 990° 
36 4 


= 7221.5 — 6806.25 = 415.25 


= 620.708 


Unequal Frequencies іп the Subclasses · 381 


The sum of squares among days is 
308.5? + 249.5? + 255? + 1772 990 
36 144 


Table 14.19 indicates a significant difference among methods, 
among compositions, and among individuals, but not among days. 


243.097 


TABLE 14.19 Analysis of Variance of Reduction in Learning Time for 4 Musical 
Compositions When Variations Among Individuals Is Used as Error Variance 


Source of Sum of at. Mean F Fs Fn 
variation squares square 
Total 9114.25 143 
Methods 415.25 3 138.4 3.30 2.70 3.98 
Compositions 620.71 3 206.9 4.94 2.70 3.98 
Days 243.10 3 81.0 1.93 2.70 3.98 
Individuals 3684.25 35 105.3 2.51 1.54 1.83 
Error 4150.94 99 41.9 


Error. ov, dhe Е ызы ыыы 


Unequal Frequencies in the Subclasses. When the frequencies 
in the subclasses are unequal, the computation of sums of squares 
becomes very complex, if an exact solution is sought. A simple 
approximate method will be described. 

The method will be illustrated in relation to data in Table 13.1 
on page 316. The data are based on scores of 96 students in a final 
examination in elementary statistics. The three sections were 
taught by different instructors. Students have been classified by 
section and by sex. Each cell of Table 14.20 contains an entry 
which shows the number of students in the subclass and the mean 


score of these students. 


TABLE 14.20 Number of Male and Female Students in Each of Three Sections 
in a Course in Statistics and Mean Score on Final Examination for Each Subgroup 


ее р 
Section Ttem Male Female 
N 28 13 
I УХ 1432 706 
x 51.142 54.308 
N 24 3 
п IX 1240 134 
X 51.667 44.667 
N 18 10 
ш Ух 791 416 


X 43,944 41.600 


382. Analysis of Variance 


The sum of squares of deviations from subclass means is 
14322 1240? 7912 706% , 134? 416°) _ 
58 ur 24 +} 18 m i3 s: 3 * 10 ) = 6883.6 


The number of degrees of freedom for variation within sub- 
classes is 96 — 6 = 90. Hence the mean square for variation within 
subclasses is 6883.6/90 = 76.48. 

To obtain the mean square for error, the mean square within 
subclasses is multiplied by the constant 

$5 + de + 18 + Ta +$ +10) = 10720 
In this expression 4 is the reciprocal of the number of subclasses 
and the quantity within parentheses is the sum of the reciprocals 
of subclass frequencies. The mean square for error is now 
(76.48) (.10720) = 8.20 

The sums of squares for rows, columns and interaction are now 
computed by treating each mean in Table 14.20 as a single observa- 
tion. The computing procedure described in Table 9.13 on page 225 
yields the analysis of variance in Table 14.21. 


240,579 — ( 


TABLE 14.21 Analysis of Variance of Data in Table 14.20 


Source of Sum of Degrees of Mean Р F F 
variation squares freedom square 5 * 
Sex 6.4 1 6.4 78 3.95 6.93 
Section 99.3 2 49.6 6.05 3.10 4.85 
Interaction 25.9 2 12.9 1.57 3.10 4.85 
Error 90 8.20 


И ens ee В ils 

The analysis indicates a significant difference between sections 
but not between sexes. The non-significant interaction indicates 
that sex differences have not been found to vary from section to 
section. 

Two Samples Matched in Subgroups on a Related Trait. Very 
often the comparison of the means of two samples on a particular 
trait which we may call X is confused by the fact that the groups 
have a noticeably different distribution on some other trait, which 
we may call Y. If X and Y are independent, the difference between 
the groups on Y may be disregarded, but if X and Y are related, 
some method must be found of compensating for that difference. 
For example, suppose a research worker is interested in exploring 
possible effects upon the emotional adjustment of young children 
of living in an orphanage as compared with living in foster homes. 


Two Samples Matched in Subgroups on a Related Trait - 383 


If there should be a considerable difference in the age distribution 
of the two samples studied, a difference in measure of emotional 
adjustment between the groups which was in whole or part due 
to difference in age might be erroneously interpreted as due to 
difference in home life. A similar effect might arise if the two 
groups were unlike in distribution of intelligence quotients, or as 
to the proportion of children who had been diagnosed as problem 
cases before admission to the institution. However a difference in 
the proportion who were color-blind or left-handed, or who disliked 
spinach might reasonably be disregarded. 

A very common method of eliminating the effect of such a 
difference in extraneous background traits which may be presumed 
to be related to the trait under scrutiny is that of matching 
individual for individual on the background traits. In a study of 
the sort referred to in the preceding paragraph, for example, a 
research worker might attempt to find pairs of children such that 
the members of each pair were of the same sex, of the same age 
within a specified limit, and of the same intelligence quotient 
within a specified limit. The prevalence of defects or of problem 
cases might be eliminated from consideration by limiting the study 
to children not known to be defective or maladjusted before 
admission. Each pair thus obtained is treated as one individual 
for whom there are two measures. The difference between those 
two measures is found, and analysis proceeds as in the problem 
on page 152. 

There are several serious objections to this procedure. (1) It 
is usually very laborious. The search for closely matched cases 
often takes a very long time and the research worker quite properly 
feels that his energy could be better spent on something else. 
(2) Some of the cases, sometimes an alarmingly large number, 
have to be eliminated because no mates can be found. Thus 
sample size is reduced and reliability sacrificed. (3) Very often 
the cases finally retained at the conclusion of the matching process 
are not representative of either of the original populations. There- 
the sample of matched pairs can be made 


fore generalization from 3 
he two universes 


only to a universe of matched pairs and not to t 
as originally defined. 
Sometimes, instead of matching case for case, the two samples 


are made to have approximately the same mean and standard 


deviation on the background traits. In this procedure it is not 
les of the same size and fewer 


necessary to have the two samp! 


384 - Analysis of Variance 


cases are sacrificed. However the work of matching is still 
tedious and the samples obtained are usually not representative of 
the original populations. A method proposed by Johnson and 
Neyman” avoiding all three of the principal disadvantages of 
case by case matching will now be described. 

The usefulness of this method may be illustrated by application 
to a set of data obtained by W. W. Biddle’. He had an experi- 
mental and a control group in each of six different schools, so that 
in effect his research was carried out six times. These groups 
varied in size. Certain environmental factors varied from school 
to school, but were constant for the two groups in the same school. 
Each group was given а gullibility test designed to measure the 
ability to detect propaganda, the subject matter of this test being 
the Pacific relations of the United States. This test was given 
twice to all subjects, once, a week or more before any experimental 
material was presented to the experimental classes and once, а 
week or more after the close of the experimental teaching. The 
gain in score made by each student is the individual trait studied. 
Between the two administrations of the test, the experimental 
classes studied specially prepared lessons entitled Manipulating 
the Public, in which both subject matter and illustrations of 
propaganda were taken largely from the period of World War I. 
The control groups did not use this material. The data presented 
in Table 14.22 differ somewhat from those in the appendix of 
- Biddle’s study, being based on original data furnished by him. 

The data of Table 14.22 form a two-way layout of Ё rows 
(6 schools) and 2 columns (2 methods) with disproportionate 


TABLE 14.22 Gain in Measurement of Nationalist Gullibility for an 
Experimental Group and a Control Group in Each of Six Schools.* 


Experimental group Control group Both groups 
School Frequency Mean Frequency Mean Frequency Variance 
Na Xa Nis Xa Ni s? 
1 51 496 26 074 77 1,024 
2 26 .202 35 045 61 .856 
3 8 466 29 315 37 410 
4 25 .246 12 — 123 87 472 
5 66 241 16 — 245 82 .810 
6 _33 450 _29 .423 62 817 
Total 209 147 356 


* Data from Biddle *. 


Two Samples Matched in Subgroups on a Related Trait - 385 


frequencies in the 2k = 12 cells. The two groups from one school 
may be considered as samples from a pair of populations matched 
with respect to school and environmental influences but differing 
with respect to the experimental variable method. Such data 
are likely to show significant variation among the school means. 
However the focus of interest is not on school differences but on 
method differences. Examination of the means in Table 14.22 
shows that in each school the experimental group gained more 
than the control, so the difference Ха — Хз was positive. Do 
these differences jointly indicate that the true population difference 
is not zero? 

The Johnson-Neyman method is based on three assumptions: 

(1) Within each of the 2k cells the observations are from a 
normal population; 

(2) The population variance is the same for all cells; 

(3) For the two matched populations from one school the mean 
difference is the same as for any other school, so pa — He = d. 
This assumption is equivalent to the assumption that interaction 
of school and method is zero. 

Since there are two columns, the column difference has only 
one degree of freedom and the test for column difference can be 
made by use of “‘Student’s” distribution. We compute 

k 
y Мала (Xa - Xa) 
i=1 i 
(14.12) D- I NINE 
i-l М: 
k 


2 
] È Nae Xa- Хы) 
i=1 * 
à UN К pse x c NaNa 


i=l i 
ис) on = £ NaNa 


(N-k-1) Dom 


The statistic 9 has *Student's" distribution with М -k-1 
$p 


degrees of freedom where N is the sum of all the cell frequencies. 


For the data in Table 14.22 
р = .2707 and 30? = .01435 
t= m = 2.26 is = 1.96 


8р 


386 - Analysis of Variance 


The null hypothesis that there is no method difference may, 


therefore, be rejected. 


An interval estimate for the population difference ш — p made 


by the methods described in Chapter 7 is 


С(.035 < ui — № < .506) = .95 


REFERENCES 


1. 
2. 
3. 
4. 
5. 


10. 


п. 


Anderson, R. L. and Bancroft, Т. A., Statistical Theory in Research, New 
York, 1952, McGraw-Hill Book Co., Ine., Chapters 18, 19, and 20. 

Biddle, W. W., Propaganda and Education, New York, 1932, Teachers College, 
Columbia University, Bureau of Publications. 

Bliss, C. I., The Statistics of Bioassay, New York, 1952, Academic Press, Tne., 
Chapter 3. 

Burt, Cyril and Lewis, R. B., “Teaching Backward Readers,” British Journal 
of Educational Psychology, 16 (1946), 116-132. : 
Edwards, Allen L., Experimental Design in Psychological Research, New York, 
1950, Rinehart and Co., Inc., Chapters 12 to 16. 


‚ Fisher, В. A., The Design of Experiments, Edinburgh and London, 4th ed., 


1945, Oliver and Boyd, Ltd. 


. Johnson, P. O. and Neyman, J., “Tests of Certain Linear Hypotheses and 


their Application to Some Educational Problems,” Statistical Research Memoirs, 
1 (1936), 57-93. 


. Mood, A. M., Introduction to the Theory of Statistics, New York, 1950, MeGraw- 


Hill Book Co., Ine., 334-348. 


. Rao, C. R., Advanced Statistical Methods in Biometric Research, New York, 


1952, John Wiley and Sons, Inc. 

Rubin-Rabson, Grace, The Influence of Analytical Pre-Study in Memorizing 
Piano Music, Archives of Psychology, No. 220, 1937, page 503. 

Schroeder, Elinor M., On Measurement of Motor Skills, New York, 1945, 
King’s Crown Press. 


15 Analysis of Covariance 


The analysis of covariance is employed in the com- 
parison of groups on one variable when information is available 
on another variable correlated with it, or on several such variables. 

The variable on which the comparisons are made will be 
denoted by the letter Y. The related variables, which will be 
called predictor variables, will be denoted by the letters X and Z. 
The analysis will use regression equations by which the Y values 
are estimated from known values of the predictor variables. The 
mathematical model will be that of regression analysis described 
in Chapter 10, so that the values of the predictor variables are 
the same for all samples in the population, but the Y values differ 
from sample to sample. 

The methods to be described involve a subdivision of the sum of 
cross-products into components similar to the subdivision of sums 
of squares in the analysis of variance. Because the sums of cross- 
products are related to covariance as sums of squares are related 
to variance, methods involving subdivisions of cross-products 
are classified under the general title of analysis of covariance. 

Symbolism in the Analysis of Covariance. To facilitate compu- 
tation we shall introduce a set of symbols formed by the letter C 
with subscripts to be described. Generally we write 
(15.1) OE (Е) ЕХЕ (ZX)!/N 
(15.2) Que zu У): = ХҮ? - (ZY)/N 
(15.3) С = (X - X)(¥ - Y) = ZXY - (ZX)ZY)/N 
Additional subseripts are used to distinguish total variation, 
variation within groups and variation between groups. The fol- 
lowing equations relate the usual partition of the total sum of 
squares into portions within and between groups, to the C notation: 


k 


№ k № k 
(15.4) D X (Xia — Xy = у у (Xia — X) + Y м, ~ X)? 
t=la=1 i=la=1 i= 
(15.5) Car = С.е T Cai 
A similar relationship holds for the sum of eross products 


388 - Analysis of Covariance 


k Ni k № 
(15.6) Y X QU. - XY - Y) = Y X Qt. - X)(Y« - Y) 


i k 
E Ў N«X;—- XyYi- Y) 
i= 
(15.7) Cur = Cow + С 


SEVERAL POPULATIONS WITH ONE PREDICTOR VARIABLE 


In this situation, comparisons are made among means of a 
variable Y in several populations. Information is also available 
ona variable X. To make the problem concrete, consider the data 
in Table 15.1. These are artificial data so contrived as to exhibit 
vividly the various relationships involved. Three groups, each 
consisting of 5 subjects have been given a prognostic test (X) 
before the beginning of a learning experiment, and an achievement 
test (Y) after the experiment. Each group is taught by a different 
method and is considered as a sample from a population identified 
by that method. We wish to test the hypothesis that the three 
methods are equally effective. 


TABLE 15.1 Prognostic Test Score (X) and Achievement Score (Y) for each of 
Fifteen Subjects in a Learning Experiment Employing Three Methods (I, Il, and 11) 


І II III Sum 
Subject X Y | Subject X Y | Subject X и 
1 2 5 6 14 7 11 20 20 
2 4 8 7 16 8 12 18 22 
3 5 Д 8 15 10 13 23 26 
4 8 9 9 19 13 14 25 28 
5 6 11 10 il 12 15 24 24 
Т. = 25 25 75 110 210 
УХ? 145 1159 2454 3758 
Ty = ay, 40 50 120 210 


The three samples have widely disparate means on the achieve- 
ment test (Yı = 8, Ys = 10, У, = 24), and a simple analysis of 
variance solution applied to the Y scores yields F = 53.0, whereas 
F » is only 3.88. We note, however, that the groups have widely 
disparate means on X also (X: = 5, X: = 15, X: = 22). We may 


Numerical Computation of C Values · 389 


ask whether the differences among the Y means can be explained 
by the differences among X means. If such an explanation is 
valid, then the methods of instruction must be considered equally 
effective. The analysis related to this problem will be discussed 
in some detail. 

Numerical Computation of C Values. The computations to be 
described will require all the C values listed in Table 15.2. Com- 
puting these from the data of Table 15.1 and setting them down 
in systematic fashion will facilitate the work. The computation 
of several of these will be carried out explicitly as in the illustra- 
tion. The student should verify the others. 


TABLE 15.2 Sums of Squares and Sums of Products Obtained 
from the Data of Table 15.1 


Within Between Total 


Item GroupI GroupII Group Ш owes MUS 


Ул Own = 20 Cor = 34 C834 Cow = 88 Cr = 730 Car = 818 
Уту Cm 105 Cm= 5 Саз = 30 Caw = 50 Cry = 650 Cyr = 700 
Dy Cyr = 20 Cy2 = 26 Crys = 40 Cyyw = 86 Cy» = 760 Ст = 846 


For the data in Table 15.1, 


С.а = 145 -e». 145 — 125 - 20 


C... = 3758 — 210 = 3758 — 2940 = 818 


3 
С.о = У Coes = 20 $344 34 = 88 

i=l 
Са, =O Cay = 818—188. 180 

T?4 + Та + Т?з A Тт 

also» (Clay = DER О 
- GSV C9 + ПШ. C. вето — эмо = 780 

С.л = 215 – eyes) = 215 — 200 = 15 
С. = 3640 — 20020 — 3640 — 2940 = 700 

3 
Cayo = Y Coys = 15 +5 + 80 = 60 

i=l 
Crys = Cayr — бы» = 650 


390 - Analysis of Covariance 


Comparison of Regression within the Groups with Regression 
for the Combined Group. If the regressions based on the separate 
groups are the same as that based on the total group, except for 
sampling variability, then the effects of different methods of 
instruction can be disregarded. Conversely, differences between 
those regressions not attributable to sampling error may be 
regarded as due to the methods of instruction. 

The regression based on the total group is 


(15.8) 7. = Y +br(X.- X) 
where 
(15.9) dem Ст 

Ст 

The regression for the ith group is 

(15.10) Fia = Yitbi(Xia — Xi) 
where 

Coys 
(15.11) b= Cs 


For the data in Table 15.1 we have, using the computations in that 
Table and in Table 15.2, for the regression based on the entire 
group 

Я. = 14 + .856(Х. — 14) 
and for the regressions based on the separate groups 

Yi. =8+ .75(Xi2 — 5) 

Y, = 10 + .147(Х, — 15) 

Үз. = 24 + .882(X;3. — 22) 


A test of significance is called for to determine whether the 
differences between the regression equations may be ascribed to 
errors of sampling. Before proceeding with such a test we notice 
that the regression lines in the populations may differ for two 
reasons. (1) Their slopes may be different. (2) They may have 
the same slope, but they may be parallel rather than actually 
identical. 

Test of Hypothesis of Common Slope. We shall now describe 
a test for the hypothesis that the slope of the regression line is the 
same for all populations, that is 


Hı: В, = 8 = Bs = Bo 


The best sample estimate of that common within-groups slope 


Test of Hypothesis of Common Slope · 391 
Bw is provided by 


к Cay 

(15.12) by = o. 
For the data of Table 15.1, 

by = .566. 


If Н, is true, the sample values bı, b» and b; differ only because of 
sampling error. Then substituting b, for each of these in (15.9) 
would produce a regression estimate У; + bu(Xi« — X.) differing 
from F; + b:(Xia — X) only because of sampling variability. If 
these differences 
ГУ: + b. (X = X3] - [Yi b(Xi. – X] 

are squared and summed for all individuals in all groups the result 
is, after simplification 


(15.13) Sim у 


То make a test of significance this sum of squares can be 
compared with the sum of squares of deviations of individual 
scores from the separate regression estimates. This is found as 
the sum of squares of differences 


Ya ГҮ; + bi( Xia – X3] 
Calling this sum of squares S; we have, after simplification, 


в. 
туі 


k 
(15.14) KREIS cum 
$-1 


zri 


S,/o? and 8/0 are independently distributed as x? with К -1 
and N — 2k degrees of freedom respectively. Hence the ratio 


S; N-2k 
(15.15) Р ar 


has the F distribution with m = k — 1 and m = N — 2k. 
For the illustrative data 


15° 5 13€ 50%) _ 
c ed - м} 109 


15° 5? К 
& = 86 а x = 47.6 


N-2k=9, k-1=2 


392 - Analysis of Covariance 
Then 
Е.С = 95 withm=2 and m=9, 
so that F is not significant. The regression lines in the three 


populations may be assumed to have the same slope, of which 6, is 
the best estimate. 


b, 


N 


5 10 15 20 25 30 А 


Fie. 15-1. Paired scores for data of Table 15.1; the regression line for each 
group; regression lines with common slope b, through group means; and the 
differences in regression estimates resulting from use of by instead of b;. 


At this point some readers will be trying to visualize Sı and 
8. On Figure 15-1 the paired scores of Table 15.1 have been 
indicated by heavy dots. The three regression lines with slopes 
bı, b; and b; have been drawn through the means of their respective 
groups. Three lines with common slope b, have been drawn 
through those same means. For each individual the difference 
in regression values obtained by using b, or b; is a line segment 
shown on the graph as a dotted line. The sum of the squares 
of all these lines is 51. To obtain a picture of the line segments 
whose squares constitute S», you should draw for each individual 


Test that 8, = 0 · 393 


his residual from the line with slope b;. These residuals are drawn 
parallel to the vertical axis extending from the dot which represents 
the paired scores of an individual to the regression line for his 
group. Some of them will overlap the dotted lines already drawn. 

Test of the Hypothesis 8, = 0. It has been established that all 
В’з may be treated as equal, but there remains the possibility that 
they are all equal to 0. The test for the hypothesis that B. = 0 is 
provided by the variance ratio 1 
mE Coa N= 2k 

Съби ~ Cayo 1 

which has the F distribution with m=1 and m = М - 2k. 


50? 9 
For our data E= (83) (86) — 50 E 


(15.16) 


= 4.44 


Inasmuch as Fs = 5.12 we might decide to ignore the X scores 
and to make our analysis on the Y scores alone. However, even 
if such a course could be defended, with so few cases and Ё so near 
the .05 value there is considerable risk of making a type-II error 
and accepting the null hypothesis when it is false. Hence, it is 
wiser to retain the X’s in the analysis. 

Measure of Sampling Variability. The sum of the squares of 
residuals from group regression lines divided by its degrees of 
freedom is an unbiased estimate of the error variance. This value 
is 


S; ^а, 1 pa t di 

99 N-3^N-3 zi Ce Y. 

which was used as error variance in (15.15) to test the hypothesis 
в, =В=.:.=Вь By accepting that hypothesis we established 


that Т Ši т may also be considered an estimate of the error variance, 
and therefore that S; and S: may be pooled to obtain an estimate 
with greater number of degrees of freedom: 

OS 
(15.18) Su = S +S: = Cw – 022 


and S,/c? is distributed as x? with N — k — 1 degrees of freedom. 
In subsequent tests S,/(N -k — 1) will be used as estimate of 
error variance. See Figure 15-2 for à graphic interpretation of Sy. 

Test of Hypothesis that a Single Regression Line Fits All 
Populations. If it is true that a single common line fits all of the 


394 - Analysis of Covariance 


populations, the regression estimates based on deviations within 
groups from a line with slope b, [that is, Y; + bu(X ia — Xj] 


Y 


25 


5 10 15 20 25 30 


Ес. 15-2. Paired scores for data of Table 15.1; regression lines with common 
slope by through group means; and residuals from those regression lines, 


should be very similar to the regression estimates based on devia- 
tions from the common mean and the regression line with slope br 
[that is, Y + b,(X.—X)]. The differences between these two 


ГУ: + bu(Xia - X)] - LY + (X. - X)] 


would then be due to sampling error. The sum of the squares of 
these differences may be reduced to 


2 2 
(15.19) Ss = Cys + Си – Gt 
and S,/o? is distributed as x? with k — 1 degrees of freedom. If 
the hypothesis of a single regression line is true, S,/(k — 1) is 
another estimate of sampling variance and will not be very differ- 
ent in size from S,/(N — Е — 1). If the hypothesis is not true, 
S,/(k — 1) is expected to be larger than S,/(N — k — 1), which 


Total Sum of Squares · 395 


is a proper estimate of error regardless of whether the hypothesis 
is true or not. Then the quotient 


(15.20) IY 


has the F distribution with m = k — 1 and m; = N — k — 1. It may 
be used to test the hypothesis of a single regression line for all 
populations. For the illustrative data 


50? 700° 
Sp = 780 + 7 — gis = 


50? 
Sw = 86 — gg = 57.6 


Roo 18h 


For m = 2 and m = 11, Fs is only 7.20. The F ratio indicates 
strongly that the hypothesis of a common regression line should 
be rejected. In other words, the differences in achievement cannot 
be wholly explained by differences in original ability but must be 
at least partly ascribed to differences in the effectiveness of teach- 
ing method. 

Total Sum of Squares. An interesting sidelight on the sums of 
squares S, and S, is that their sum is equal to the sum of squares 
of deviations of individual У values from the regression estimates 
for Y if all groups are combined and the regression line has slope 


br = Сат. Hence the relationship 

zzT 
(15.21) Sr = 8. Si 
corresponds to a subdivision of the total sum of squares into 
portions as has been described in the analysis of variance. S; can 
be computed independently of S, and S; by the formula 


Caur 
(15.22) Sr = Cur ~ б, 


: 700)? 
For the illustrative data S; is 846 — e = 246.98 


while 5, +S» = 57.6 + 189.4 = 247.0 


The sums of squares which are components of 87 
their formulas in Table 15.3. 


are listed with 


396 - Analysis of Covariance 


TABLE 15.3 Summary of Components of Sum of Squares of Y about Re- 
gression Line Y, = Y + br(X, — X) with the Formula and the Number of Degrees 
of Freedom for Each 

SS ESSE RENNES 


D Degrees of 
Nature of variation Symbol Formula freedom 
Group regression coefficients 8 Y Cuy _ Cw КЕЕ, 
about common coefficient A ih Cui Craw 
1 k 
Scores about regression line Си ue Зу 
for their own group 8 б» 2 m йш 
Group means Y; about regres- S СА Cap 1-2 
sion line based on means ч а Г 
Difference between regression 8, Chou, Cnr Cor 1 
coefficient based on means x Cow Cu Сыт 
and common regression coef- 
ficient within groups 
Scores about regression lines 2] С? 
with common slope bw Вы = Si + Sa Cow — Go uw 
Group means about regression I Choo _ Ст 
line with slope b, Saath о toe k-l 
Scores about regression line Chr 
for total group Sr Cur — Car dei 


Regression among the Means. If there is a common regression 
line for all populations, then the pair of means for each population 
must lie on that line. These 
means are X; and и, where 
и: = E(Y j)is the mean of Y for 
the ith population. However, 
even if there is no common re- 
gression linefor all populations 
the means may still lie on a re- 
gression line determined by 
those means as in the adja- 
cent sketch. Here the three 
solid lines have slope b, and 
the dotted line is the regression line for means. This line would 
be estimated by formula 


(15.23) Y,-Y-b(Xi;- X) 


Comparison of the Adjusted Means · 397 


(15.24) where ый = 


The sum of the squares of the deviations of these means from their 
estimate by (15.23) reduces to 
Cas 
Съ 
and 85/0? has a x? distribution with k — 2 degrees of freedom. The 
more closely the means cluster around a straight line, the smaller 
S; will be. 

A test of the hypothesis that regression among the means is 
linear is provided by the ratio 


(15.25) Ss = Сшь- 


(15.26) _ Ss, 


which has the F distribution with n; = k — 2 and m = N-k-1. 
For the illustrative data, 


650? 


8: = 760 — 730 = 181.2 
and - 1812 . E = 34.6 


This value is much greater than Fs = 9.65. It must be con- 
cluded that regression among means is not linear (see Figure 15-3). 

Comparison of the Adjusted Means. In the illustrative data 
the differences among the means of У(Т, Ys, Ys) are in part due to 
differences among teaching methods and in part to differences in 
the X means. To ascertain what the differences among Y means 
are because of differences among teaching methods alone, the 
effects of differences among X means should be eliminated. This 
elimination is achieved by adjusting all the means to a common 
X value, which, for convenience, may be taken as the mean of 
all X values, X. | 

The adjustment is accomplished numerically by subtracting 
from each mean the amount it gains through being associated with 
an X mean which is above X, or the loss through being associated 
with an X mean below X. The adjustment is by(Xi – X). The 
adjustments are shown in Table 15.4. 

The adjusted mean is least for Group II and greatest for 
Group III when all X means are adjusted to X. The tests of 
significance have indicated that this difference cannot be explained 


by sampling variability. 


398 - Analysis of Covariance 


Y 


25 


20 


15 


10 


5 10 15 20 25 30 


Fig. 15-3. Means of groups indicated by crosses; regression line determined 
by those means; and residuals of means from regression line. 


5; = 5 (sum of squares of dotted lines). 


TABLE 15.4 Adjustment of Y Means for the Three Groups in Table 15.1 
ПАЛЬ ROC ee УМ ie "үпү еы ызуу кы ВБ a 


Group Observed mean Adjustment Adjusted Y mean 
: Y, X 57(Xi — X) Y; — (X: — X) 
I 8 5 — 5.13 13.13 

II 10 15 57 9.43 
ш 24 22 4.56 19.44 
Combined 14 14 


MATCHED REGRESSION ESTIMATES. TWO POPULATIONS 
WITH ONE PREDICTOR VARIABLE 


If the categorical trait is dichotomous, a method of analysis is 
available which is related to the foregoing but which has two great 
advantages over it. (1) This procedure does not depend upon the 
assumption that B; = В», and therefore it can be used in situations 


Significance of the Difference Y,— Y, · 399 


where the former would not apply. (2) By this procedure it is 
possible not only to explore the question of whether one of the 
two populations (or methods, or treatments) exceeds the other 
on the average but also to ask for what values of the predictor 
variable it does so. 

In the situation considered here, there is one predictor vari- 
able X, and two populations; the regression of Y on X is assumed 
to be linear but not necessarily the same for the two populations; 
and Y is assumed to be normally distributed with constant vari- 
ance for each value of X. On page 406 we shall consider a similar 
problem with two predictor variables. _ 

Significance of the Difference Y, — Y, for a Particular Value 
of X. From the data, separate regression estimates for Y can be 
obtained from each group. These are 


(15.27) ў, = а +ЬХ, and TYa-a-bX. 
where b;- = and а= Y;- ЫХ; 


тїї 

Now consider a specific value of X which may for convenience be 
called Х’. For this value the difference in regression estimates 
for the two groups is 


(15.28) D=%— Ў = (a — а) + (br — ba) X’ 


The adjacent sketch illustrates the two regression lines and the 
value D for a selected X’. Obviously Disa variable which depends 
upon X. At the point X' the 
Г (2) two groups may be said to be 
matched with respect to X, so 
(2) that a comparison between f 
and Ў. ѓог X'is legitimate. If D 
should be zero for the entire range 
of X, then clearly the two regres- 
sion lines coincide and the groups 
are indistinguishable with respect 

to Y. 
Š Tf the lines cross, the value Xo 
corresponding to the point where 
they cross is the point at which D = 0 for the sample. This point 
will be called the point of nonsignificance. The purpose of this 
analysis is to define a region on the scale of X in which the value 
of D is non-significant. This region will have X, somewhere 


1 
П 
1 
1 
П 
П 
1 
1 
о у 
X 


400 - Analysis of Covariance 


within it, probably near its center. On one side of this region 
of nonsignificance there may be a region of significance in which 
Ў, > Ў, and on the other side a region of significance in which 
У, Y, Whether there are two, one, or no region of significance 
depends upon the size and distribution of the sample as well as 
upon the position of the regression lines. If cases are few and the 
lines close together the region of non-significance may extend over 
all the observed values of X. 

The variance of D depends upon the variability of the groups, 
the size of N; and Ne, the relation between X and Y, and the 
distance of X’ from X, and from X. That variance is given by 


the formula 
2 СЗ N à (xX'—Xj) 
^ [X (©. e ritu. 


i=l 


Ni +N2-4 
For a fixed value of X’ the ratio 


(15.29) зр 


t=— 


8р 


can be calculated and its significance determined by reference to 
“Student’s” distribution with Ni + № -4=М-4 degrees of 
freedom. 

Region of Significance. The procedure outlined thus far necessi- 
tates calculation of significance separately for each value of X’. 
Sometimes there is such a single value of X in which we are par- 
ticularly interested and for which we wish to explore the difference 
in Я, and Y, More often we wish an answer to the question, 
“For what values of X is the difference Я, — Ў, significant?” 
Clearly, if two regression lines differ significantly at one point X", 
they must differ significantly in some neighborhood near X’. 
It is interesting and valuable to determine such a neighborhood, 
which may be called the region of significance. This region will 
now be determined. 

Both D and sp are functions of the variable X. (We shall now 


drop the prime for convenience.) The ratio 2 has the ¢ distribu- 
D 


tion for each fixed value of X. Hence those values of X for which 


27 > ha 


Region of Significance - 401 
provide a one-tailed region of significance a with positive values of 
Yi — У, critical; those values for which 

J^ eit 
8р 
provide а one-tailed region of significance a with negative values 
of Y, — Y. critical. 
A two-sided region with significance level a is determined by 


2 
values of X for which D > lije. We shall now work out the 
D 


two-sided region, but all that is needed to obtain the one-sided 
region is to substitute o for То and to allocate critical values to the 


appropriate end of the scale. 
The two-sided region of significance consists of those values 


of X which satisfy the inequality 

(15.30) D? зо. > 0 

This is an inequality in X since D = a — & + (b; — b) X and it 
may be written 

(15.31) AX? +2BX+C 20 


where 


Ё}, а и | 1 1 жү 
Оев тана A aL (Co Сыз (c ш e) re (be =) 


(15.33) 
= Pye £ 355 | x X, z$ - 
"Eu i 2 (Cm - e) (+ E) + (m - a) bs - b) 


„сё [5 Ce ar T а 
Ns Дем al, (Co 7 o2) (s; sy Cia T sel + (a> а) 


The value of X for which D = 0 is 


w а» — 01 
(15.35) Xcy-5 


Bounding values for the region of significance may be obtained 
by solving the equation AX? + 2BX + C = 0 where the numerical 
values of А, B, and C аге obtained from formulas (15.32) to 
(15.34). The solution is 


в/в AC 
(15.36) x= chew 


402 - Analysis of Covariance 


If B? АС > 0, the equation has two solutions and those solu- 
tions are bounding values such that the region of non-significance 
lies between them, the region of significance outside them. If 
В? АС < 0, there is no region of significance. It is highly 
unlikely that B? — AC will be exactly zero. 

Warning. Because formula (15.31) has been written with 2B 
instead of B as the coefficient of X, the solution given in (15.36) 
differs slightly from the form which may be familiar to the reader. 

Data reported by Bills * may be used to clarify the procedure. 
Eighteen slow readers in the third grade were studied for threc 
periods of 30 days each. The first period was a control period, 
the second was the therapy period, the third was a period for 
studying cumulative effects of therapy. All the children were 
given reading tests at the beginning and end of the first period, 
and intelligence tests during the period, but received no other 
experimental treatment. During the therapy period, 8 of the 
children received play therapy, the other 10 did not. Reading 
tests were given at the end of the therapy period. Nothing was 
done with the children during the third period but they were 
again tested at the end. The variable we shall use as criterion 
will be the gain in reading score during the therapy period. Several 
predictor variables are available such as chronological age, mental 
age, initial reading score, but we shall at this time use only one, 
the gain in reading score during the control period. The raw data 
for the analysis are shown in Table 15.5 and the derived data in 


TABLE 15.5 Gain on Reading Test for Each Child during 
Control and Therapy Periods* 


I. Therapy group II. Non-therapy group 
Child Gain during Gain during hild Gain during Gain during 
control period therapy period control period therapy period 
1 .00 80 1 .00 .20 
2 0 1.35 2 — .30 — .05 
3 10 90 3 00 A5 
4 — 45 ESI 4 33 15 
5 — 20 .65 5 — 33 28 
6 .25 2.05 6 .00 20 
TÉ .25 1.45 7 — .60 45 
8 .20 45 8 — 45 15 
9 — .20 15 
10 — .85 25 


* Data taken from Bills, В. Е.2 


Region of Significance · 403 


Table 15.6. Here X is used to represent gain during control period 
and Y to represent gain during therapy period. 


TABLE 15.6 Statistics Computed from the Data of Table 15.5 
for Problem Using Method of Matched Regression Estimates 


d I п 
Е Therapy group Non-therapy group 
Ni 8 10 
T, =ZY 8.10 2.23 
Y: 1.0125 223 
Т. = ХХ .25 — 1.90 
Xi 03125 — 19 
С. 4197 .6718 
Cwi 2.2038 .1986 
Ci .6044 — .0592 
1.4400 — .0881 


9675 2063 


On Figure 15-4 the 8 small circles represent the paired scores 
for the 8 children in the Therapy group. The regression equation 
for this group 

Y,-.974-144X; 
is also drawn. The 10 small crosses on the same figure represent 
the paired scores for the 10 children in the Non-therapy group. 
The regression line for this group 

У, = .24 — .088X2 
appears to the eye strikingly different from the first one. The 
point of non-significance is 
2063 — .9675 _ _ 
Xo = 1440 + .088 ^ {0 
and on Figure 15-4 the two regression lines are seen to cross at a 
point for which X = — .50 and У = .25. 

Substitution of the statistics (Table 15.6 

to (15.34) with a = .05, yields 

A = .40, B = 1.06, C = 44, Х, = - 5.0, X, = — 22. 
There is a region of significance for X > —.22, and in that region 
Ў, > Y, The other region of significance, for which Х <-5.0 
is of по practical interest because it is entirely outside the range of 
the observed data. It may be concluded that children who gained 
during the initial period or lost not more than .22 will make higher 


) in Formulas (15.32) 


404 - Analysis of Covariance 


gains in à subsequent period if they have therapy. For other 
children, the evidence does not indicate whether they will do 
better under therapy or not. 


para pi apr раа ЕЗ Ю РО 
hanno om 


Gain during therapy period 


Y= 


POKNWRUDYUDWOHN w 


—.6 -.5 -.4 ~3 -2-1 0 1 2 3 4 
X=Gain during control period 


Fic. 15-4. Regression of gain during therapy period on gain during 
control period for two groups of children. (Data from Bills.) 


о = child in therapy group 
x — child in non-therapy group 


X, = —.50 point of non-significance | 
Х > —.27 is region of significance favoring therapy group | 
SEVERAL POPULATIONS WITH TWO PREDICTOR VARIABLES 


This situation is a natural extension of that described on | 
page 388 in which there was a single scaled predictor. However, | 
because the regression equation now involves two predictor 
variables, changes must be made in the computational pattern 
and in the degrees of freedom. 

The total sum of squares S; which is to be analyzed is the sum 
of the squares of the residuals Y, — Y, where 


Y.-Y b. Х) + bar (Za 7) 


This is the equation of a regression plane as discussed in Chapter 
13. The three means Y, X and Z are the means of the total group. 
The two regression coefficients are also computed for the total 
group. (Here b, and b, have been used as abbreviations for бу». 


Several Populations with Two Predictor Variables - 405 


and бш.) Ina problem in analysis of covariance it is not usually 
necessary to compute the regression equation but only to obtain S7. 
If it were desirable to compute the regression coefficients, that 
could be easily done by the Formulas 

Стае te Cyr zer z Cyr zen = Cur ar 
ferc Cer mad Pen аба СЕ, 
which are algebraically equivalent to Formulas (13.2) on page 319. 
The total sum of squares computed as 


Стат + С?„тСът — 2CyerC yer Coen 
(15.38) Sr = Cyr — QE Cin Ол 


(15.37) ber = 


has N — 8 degrees of freedom. 

As on page 395, the total sum of squares may be separated 
into two components S, and S, and each of these separated into 
two others: 

8, = Si+ 5 

Sy = 8+ 84 

В: = S, + Sp = Sit S2 + Ss + Ss 
Under the null hypothesis, each of these component sums divided 
by its respective degrees of freedom is an unbiased estimate of 
the common variance. The formulas for computing the various 
sums and the degrees of freedom associated with each are shown in 
Table 15.7. 


TABLE 15.7 Summary of Components of Sum of Squares of Y about Re- 
gression Plane Ya = Y + byl Xa — X) + Ьл. — Z) with the Formula and De- 


grees of Freedom for Each 
Degrees of 


Symbol Formule freedom 
Cyr Csr + Cyst Crt — 20т0тСут N-3 
a Cur f Сытбыт — Cur 
С? С + С Сего — 2б usen usn на 
"ENS ioo c 
S Sr — Sw k—1 
k 
Сия + Съ — WryiCesiCvet — 9k 
> Cwe — 2 Е быбш — С ч 
8i Сән 2%—1) 
Сб: + Chai — 2C. CzaC us RS 
8 Cm- Cala — C 
2 


406 - Analysis of Covariance 


The sums of squares listed in Table 15.7 are analogous in 
meaning to those in Table 15.3 except that the phrase “regression 
plane” is to be substituted for “regression line.” The tests of 
hypotheses are also analogous. 


MATCHED REGRESSION ESTIMATES. TWO POPULATIONS 
WITH TWO PREDICTOR VARIABLES 

The procedures and the rationale employed in this situation 
are analogous to those described on page 399 except that (1) the 
computing formulas are somewhat more involved, (2) the figure 
is three dimensional, and (3) the region of significance is not a 
line segment but a portion of a plane. 

Regression estimates can be calculated for each group: 

Fia = mit bnat baa and Fon = 2-Х. +027, 

where 


Met. СС = С: Сан 
pom бал ч бы = 


С.С — CasiCays 

Сб Он 

(15.41) a; = Ү;—быХ‹— Бый 

For each pair of fixed variables, X’, Z’, the significance of the 
difference 

(15.42) D- (a i= а») + (ba = bee) X’ em (ba TE b.) Z' 


may be tested. 
The variance of D is 


(15.40) бы 


(15.43) "x n 5 
where 
2 
(15.44) P= Y (Cys T Си: = С.Ф.) 
= 1 
and 
(15.45) Q= AMA 
2 
ut Сыбы |0 – Ki)? 204X'-XQ)Z2 —Z),(Z-Z) 
p Ou | Caw = Сыбы MECE } 


Е D 
The ratio t = = has "Student's" distribution with N — 6 degrees 


of freedom. 


Region of Significance - 407 


Region of Significance. From the inequality D? — 8р, = 0, 
a region of significance œ may be obtained by a procedure analogous 
to that on page 400, except that the region which, in that dis- 
cussion, lay on the X-axis, now lies in the XZ plane. From this 
inequality, the limits of the region of significance may be found 
by drawing the curve which is given by the second degree equation 


(15.46) AX?+2BXZ+ CZ? +2EX +2GZ +H = 0, 
where 

PrE 2 LG 

М — 6,5 С. Саа: — Cs 


PP S Cosi 
(1548) B= (ba = bu) (ba — ba) + N 5G бб, = Os 


(15.47) А = (ba — bz2)? 


PP 5 Са 
(15.49) C= (ba bx)? = N- 82. ео: = C. 


Р.Р 2 ХО Zila 
(15.50) Е = (a - а)ба — ba) + Ww 62 СС бя 


Us „Р < ZO CA ХО: 
(15.51) G = (a — a) (ba — ba) + р CR PEE: m 


(15.52) H = (а — а)? ый А 
аР [Е Сы‹Си‹ (2 a РАЕН 22h N ] 
N -6LA CC — О: Сын (OE CRVA NIN 


When the coefficients A to H have been calculated the curve 
which is represented by equation (15.46) may be plotted on the 
X, Z plane. The curve is a conic. The determination of conic 
sections from a second degree equation is discussed in textbooks 
on plane analytic geometry. 

In order to assist the reader a brief discussion of the procedure 
in plotting the conic will be given. The discussion will not be 
complete, but will cover situations which are likely to arise т 
practice. 

The conic, if it exists at all, will be an ellipse or hyperbola, and 
in general will be centered at a point other than the intersection 
of the XZ axes and will be oblique to these axes. Plotting such 
a conic is fairly laborious but the work can be considerably simpli- 
fied by locating new axes X’, Z' in relation to which the conic is a 
central conie and determining the equation of the conic In terms 
of X', Z' coordinates. 


408 - Analysis of Covariance 


We shall assume that AC — B? = 0. 
The new axes intersect at the point (Xo, Zo) with coordinates 


GB -CE 
(15.53) Х = AC в 
and 

BE - AG 
(15.54) Z- dO- B 


This point is to be marked on the diagram. The new X’ axis is 
a line passing through the point Xo, Zo with slope 


С- A zx VAP' + (A — С)? 
2B 


(15.55) m 


The sign of the radical in m is to be taken as the same as that of B. 
The X’ axis is given by the equation Z = Zo + m(X — Xy). This 
line should be drawn on the chart and a line drawn perpendicular 
to it at XoZo. 

In terms of the new X’, Z’ axes, the conic has the form 


(15.56) АХ) -C'(Z)? -H'-0 
where 

(15.57) А’ = (А + 2Bm + Cm?)/(1 + m?) 
(15.58) C’ = (Ат? — 2Bm + C)/(1 + т?) 


the sign before the radical being the same as that of B, and 
(15.59) Н’= АХ? + 2BXyZ + С7 +2ЕХ + 2GZ, +H 


Tt does not seem at all likely that in practice H’ will be equal to 
zero, hence we shall assume that it is not zero. 

The type of curve represented by Equation (15.56) can be 
determined by examination of the coefficients A’, C’, and Н’ as 
follows: 

(1) If A’, C’, and H' all have the same sign, there is no conic 
and no region of significance; 

(2) If A’ and C’ have the same sign, but H’ has the opposite 
sign, there is a region of significance bounded by an ellipse; 

(3) If A’ and C’ have different signs, there is a region of 
significance bounded by a hyperbola. 

If the conic exists, it can best be plotted when Equation (15.56) 
is put into the form 


(X, (2)? 
(15.60) Spt pe 


Region of Significance - 409 


10 


x Group I= Exp. 
o Group = Control 


g6 

Q 

o 

N 

= 

л 

Ф 

КБ 

Ki 

z 

= 

M 

>4 о о x@x 

Region of Significance, &=.01 
Exp, Better 
BK xxOxx0$8xO CX x 


о © 20 со-озо- 
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 
X=Intelligence Score+2 


Fra. 15-5. Comparison of two groups on а test of superstitions. 
From Austin Bond.* 


Plotting this equation on the original XZ chart would be 


difficult because the new axes are oblique to the lines of the grid. 
However the graph of Equation (15.60) may be plotted on another 
grid; then the axes of this grid may be aligned with the X’, Z' 
axes as drawn on the original grid, and the graph traced on the 


original grid with tracing paper. 
In addition to plotting the conic which outlines the region of 


410 - Analysis of Covariance 


significance, it is desirable to draw the line of non-significance. 
This line has the equation 


(15.61) (a; — аз) + (bai — 553) X + (ba — b2)Z = 0, 


where the a's and b’s are as given by Formulas (15.39) to (15.41). 

Examples of the method may be found in an experiment 
conducted by Bond? in teaching genetics to college freshmen 
in a course called “Science and Civilization.” Experimental 
and control groups had unequal distribution on intelligence as 
measured by the American Council Psychological Test and unequal 
distribution on the initial application of the various tests employed. 
The symbols and formulas used by Bond follow the pattern in the 
original paper by Johnson and Neyman®. The formulas stated 
in the preceding paragraphs of this text are equivalent to theirs 
but use different symbols and permit an easier method of computa- 
tion. Bond’s study presents eight figures comparing his two 
groups on eight different tests. Some of the critical regions are 
hyperbolas, some ellipses; in some the experimental method favors 
one type of student, in some another. 

One test dealt with common superstitions about heredity, the 
score indicating the number of such superstitions accepted as true. 
Sample data are presented in Table 15.8 and Bond’s graph showing 
critical region is reproduced as Figure 15-5. Here intelligence 
scores are distributed along the horizontal axis; initial scores on 
the superstition test are distributed along the vertical axis; final 
scores on the superstition test are not shown but are to be imagined 
as distributed along a third axis perpendicular to the other two. 
The experimental method produced better (that is lower) final 
scores over almost the entire range of the observations, and the 
difference was significant for those who had originally accepted 
at least two of the superstitions listed and who had intelligence 
average to low. 

Sample data from a test on opinions on imperialism shown 
graphically in Figure 15-6 present a somewhat different picture. 
Looking at the line of non-significance we note that regardless of 
the intelligence level the experimental method produced better 
results on the final test for students whose initial test score was 
low while the method used with the control group produced better 
results for students whose initial test score was high. However 
such differences are significant only for students in approximately 
the lower third of the distribution of scores and these derive 
significant advantage from the experimental method. 


Y =Initial Test Score 


Region of Significance - 411 


40 


х Group I= Exp. Region of Significance a=.01 
© Group Ш = Control Control Better 


30 


20 


10 


Region of Significance a=.01 
x Exp. Better 


10 20 30 40 50 60 70 80 90 100110 120 130 140 150 160 
X=Intelligence Score+2 


Fig. 15-6. Comparison of two groups on a test, of opinions on imperialism, 
From Austin Bond.* 


TABLE 15.8 Sample Data for Comparison of Two Groups in Regard to 
Superstitions Concerning the Laws of Heredity* 


Dann Experimental Control 
group group 
Number of cases N, = 54 Nz = 57 
Sum of scores on final superstitions test DY = 90 SY = 176 
Sum of scores on initial superstitions test ZZ = 119 ZZ = 178 
Sum of scores on intelligence DX = 446 УХ = 455 
Mean final test score У, = 1.667 У, = 3.088 
Mean initial test score Z, = 2.204 Z: = 3.035 
Mean intelligence score Х, = 82.59 Х, = 79.82 
Z(Y- Y): Cyn = 101.9 Cum = 280.4 
Z(Za — Zi)? Сы = 172.6 Cus = 306.0 
2(Xia — X) Сыл = 26806 Сыл = 26250 
Z(Ya — Y)(Z; — Z2) Cya = 74.56 Сыз = 217.94 
(У. — Y)(X; — X) Са = — 416.6 Оа —8139 
Сш = — 888.4 Сш = — 300.4 


(2. — Z)(X: — Xs) 


Y = Final score on superstitions test 
Z = Initial score on superstitions test 
X = Score on American Council Psychological Examination 
Large value of Y or Z indicates acceptance of superstitions 
* From Bond? 


412 - Analysis of Covariance 


REFERENCES 


ale 
2. 
3. 


ох o яч o 


Anderson, В. L. and Bancroft, T. A., Statistical Theory in Research, New York, 
McGraw-Hill Book Co., 1952. Pages 297-312, 369-375. 

Bills, В. E., “Nondirective Play Therapy with Retarded Readers,” Journal 
of Consulting Psychology, 14 (1950), 140-149. 

Bond, Austin D., An Experiment in the Teaching of Genetics. New York, 
Teachers College, Columbia University, Bureau of Publications No. 797, 1940, 


. Dixon, W. J. and Massey, F. J., Introduction to Statistical Analysis, New York, 


McGraw-Hill Book Co., 1951. Pages 173-182. 


. Johnson, Palmer О. and Neyman, Jerzy, “Tests of Certain Linear Hypotheses 


and Their Application to Some Educational Problems,” Statistical Research 
Memoirs, 1 (1936), 57-93. 


. Lindquist, E. F., Statistical Analysis in Educational Research, Cambridge, Mass., 


1940, Houghton Mifflin Co., Chapter 6. 


. Mood, Alexander, Introduction to the Theory of Statistics, New York, MeGraw- 


Hill Book Co., 1950. Pages 350-358. 


. Quenouille, М. H., “The Analysis of Covariance and Non-orthogonal Com- 


parisons,” Biometrics, 4 (1948), 240-246. 


. Rao, С. R., Advanced Statistical Methods in Biometric Research, New York, 


John Wiley and Sons, 1952. Pages 119-128. 


16 Percentiles 


Tn the preceding chapters we have considered inferences 
based on sample moments such as the mean, variance and covari- 
ance, and on functions of these moments. In this chapter inferences 
based on percentiles will be discussed. 

The computation of percentiles is desirable in two classes of 
situations: 

1. When percentiles are actually more satisfactory statistics 
for estimation of population characteristics. For some populations 
the moments do not satisfy the conditions of consistency and 
efficiency as these are described in Chapters 3 and 7, but percen- 
tiles, or functions of percentiles, do satisfy the condition of con- 
sistency and at least approximate the condition of efficiency. 

2. When percentiles are less satisfactory statisties than the 
moments but serve to simplify computations, 

These situations and the related computations will presently 
be discussed in detail. 

Notation for Percentiles. In agreement with the notation 
used in previous chapters, percentiles will be indicated by a symbol 
based on the variable, and a subscript which is the decimal equiv- 
alent of the percentile. Thus X m is to be the notation for the 
twentieth sample percentile of the variable X, X. is to be the 
median, ete. The corresponding population percentiles will be 
denoted as £.20, £s ete. (#18 the Greek letter xi). The symbols ДА» 
and £, will indicate the (100p)th percentile, because p 18 the 
decimal equivalent of the percent 100p. We shall need to refer 
to the population ordinate at the (100p)th percentile. This will 
be denoted as Ур. А 

Quantiles. The term quantile is often used in place of per- 
centile. It differs from the percentile only in the fact that it is 
referred directly to the proportion instead of to the percent 
equivalent of the proportion, Thus X зь in the present notation, 
is the 25th percentile or the quantile of order 25. The terms 
decile, quartile, and percentile are all subsumed under the term 
quantile. Because the term percentile is in more common use 


that term will be employed here. 


414 - Percentiles 


Distribution of Percentiles in Large Samples. In large samples 
X, has an approximately normal distribution with mean £ and 
standard error 


1 ie 
(16.1) У 


In particular, the median has an approximately normal distribu- 
tion with mean £.5 and standard error 


1 
2y VN 


If sampling is from a normal population the ordinate at the 
population median is 1/(0у2т). Therefore, for samples from a 
normal population, the standard error of the median is 

il EH m d 
(16.3) 2у к = 0: 2N ^ 4/1.57 ММ 

Consistency of Estimate. From Formula (16.1) it is clear 
that the standard error of a percentile approaches zero as N 
becomes increasingly great. Hence, for large values of N there 
is little likelihood that the sample percentile will differ greatly 
from the population percentile, and the sample percentile is, there- 
fore, a consistent estimate of the population percentile. 

In sampling from a normal population the mean and median 
are both consistent estimates of и. However there exist popula- 
tions such that the sample mean is not a consistent estimate of 
the central parameter but the median is. These populations 
are characterized by the fact that extreme cases are likely to occur 
frequently in samples. 

Efficiency of Estimate. It has been stated that, for samples 
from a normal population, both the sample mean and sample 
median are consistent estimates of и. However, for such samples 


(16.2) 


the standard error of the mean is VN but the standard error of the 


median is 1.57 Vy Since the mean has the smaller standard 


error it is preferred to the median as an estimate of и. 
Similarly in samples from а normal population a consistent 
estimate of c is given by 


VE. 


Classes of Efficient Statistics - 415 


and also by .74(X 75 — X25). However, s is the preferred estimate 
because its standard error is smaller. 

In general there may be many consistent estimates of a popula- 
tion characteristic. The estimate which has the least standard 
error is called efficient. Other estimates, called inefficient, may, 
nevertheless, be used because of computational convenience. 
When inefficient estimates are used it is desirable to have a measure 
of their efficiency. For this purpose the following measure of the 
efficiency of a statistic has been adopted: 


variance of efficient statistic 
variance of given statistic 


(16.4) Efficiency of a given statistic = 


It is evident from the formula that an efficient statistic has 
efficiency one, but an inefficient statistic has efficiency less than 
one. When this formula is applied to the median of a sample 
from a normal population the efficiency of this statistic is 

oc*/N 
1БТ(ої/ у” 9 


An important interpretation of this concept of efficiency is that 
it is a measure of а loss in number of cases. An efficiency of .80 is 
equivalent to sampling reliability based on 80% of the cases in the 
sample. 

Classes of Efficient Statistics. Certain classes of statistics are 
known to be efficient. Among the best known of these are the 
maximum likelihood statistics. These are statistics which, when 
substituted for the parameter in expressions for the probability 
density, make the likelihood of the sample a maximum. Under 
the usual assumptions, that samples are drawn from a normal or 
binomial population, most of the statistics which have been 
studied in this text, as the percent, mean, variance, correlation 
coefficient and regression coefficient are maximum likelihood 
statistics. These statistics not only have minimum standard 
error but are distributed normally for large samples. 

The maximum likelihood statistics are a subclass of a class of 
statistics known as BAN estimates, which have the common 
property of being normally distributed with minimum variance 
and with the true parameter value as mean. BAN stands for best 
asymptotically normal. For a fuller treatment of these classes 
of estimates the reader must consult the mathematical literature 
of statistics. 


(16.5) Efficiency of median = 


416 - Percentiles 


Estimation of the Mean and Standard Deviation by Per- 
centiles. It has been stated that the median has efficiency .64 аз 
an estimate of the mean of a normal population. More efficient 
estimates can be obtained by computing averages of several 
percentiles symmetrically placed about the median. Several 
such estimates together with their efficiencies are listed in Table 
XVII, in the Appendix. This table shows that very high efficiency 
can be achieved by use of several percentiles. While even five 
percentiles provide an efficiency which is less than unity there 
may be an advantage in estimating a mean by these methods 
because of great saving of time. Such a saving will occur when 
a frequency distribution has data grouped in unequal class inter- 
vals. The method is particularly important when the ends of a 
distribution are open so that the mean cannot be computed at 
all by usual methods. 

Another occasion when the use of percentiles provides a saving 
of time is when data are punched on IBM cards. The cards can 
be sorted in order and the percentiles can be located by a simple 
count, 

When percentiles are used to estimate the standard deviation 
of a normal population the statistics listed in Table XVIII have 
high efficiency. 

The decision as to whether to use an efficient statistic or to 
choose one of lesser efficiency depends upon circumstances. If 
speed is important, estimates based on two percentiles may be 
used. If the number of cases is great, a saving in labor can be 
achieved without too great loss in efficiency by using estimates 
based on four or six percentiles instead of just two. When data 
are scarce or are expensive to obtain and labor of computation is 
less important than labor of obtaining original data, then efficient 
statistics should always be used in order to obtain the maximum 
information from the minimum number of cases, When data are 
cheap and readily available and when computation must be done 
by inexperienced persons — as is often the case in industry — or 
when the results of computation are needed very quickly, then it 
may be advisable to use inefficient statistics and to compensate 
for the loss of information by taking a larger sample. 


EXERCISE 16.1 


The frequency distribution below is based on scores of 160 persons in 
a mental test of 30 multiple-choice items. Table 16.1 contains estimates 


Estimate of Mean and Standard Deviation - 417 


of the mean and standard deviation based on X, s and on percentiles. 
The reader may verify the indicated values and may compute estimates 
based on other statistics. The reader should check the accuracy of com- 
putations based on percentiles with that based on statistics which have 
efficiency one. 


TABLE 16.1 Frequency Distribution of Scores of 160 Persons on 30 Items 


X Т Estimates of the mean 

28 2 Formula Value 
a i X = 3fX/N 21.012 
25 7 Хо 21.115 
24 11 3X в + Хз) 21.020 
23 21 SOC n + Хи + Х.и) 20.993 
22 23 F(X эъ + X ns + Хзь + X ast) 20.992 
21 26 Estimates of the standard deviation 

20 19 Formula Value 
19 17 

18 10 zfx:— (XfX) 

17 9 ques = ca eq NGC 2.76 
16 6 N-1 

15 1 .339(Х.в — X.) 2.81 
14 1 171(X в + Хо — X. — Xo) 2.79 
13 1 .120(X в + X »-+ X. — X.» — Xa Xo) 2.76 


Estimation of the Mean and Standard Deviation from Item 
Analysis Data. Concepts analogous to those used in estimating 
the mean and standard deviation of a population from percentiles 
may be used in estimating these population characteristics from 
item analysis data. Answers to each item are classified as correct 
with score 1 or as incorrect with score 0. The item analysis which 
is considered here is based on high and low groups of test scores, 
a test score being the number of items answered correctly. The 
high and low groups are placed symmetrically about the median 
of test scores, as the high and low 50%, or high and low 27%. 
Data are assumed available for all items in the test. 

The discussion will be facilitated by the introduction of an 
appropriate notation: 

N = the number of persons for whom test scores are available. 

k — the number of test items. 

р = Ње proportion in one of the extreme groups, p = .50. 

Hence the number of persons in one of the groups is pN. 


418 - Percentiles 


Ён; = the number of correct responses to the ith item in the high 
group. 
Ёш = the corresponding number for the low group. 
X ua = | the score of the ath individual in the 
A | group and low group, respectively. 
У» = ordinate of unit normal curve below which the area is 
equal to p. 
In the type of item analysis described above, Ru; and Rz: are 
k 


known for each item. Consider now the sum Ў Ён; over all items. 
i=1 


This is the total number of correct responses in the high group. 
Now the total number of correct responses in the high group can 
also be obtained as the sum of all scores of individuals in the high 


pN 
group, namely 2 Хна. Since these two sums are equal to the 
a=1 


same thing 
R рү k pN 
У Ru= Y Xue and also Y Ru-) Xz. 
ful а=1 i azı 


The mean scores of the two subgroups are 


2X на ХХ ш 
SN and DN 


On the assumption that the scores are distributed normally 
these means are approximations of the corresponding population 
means in large samples, or 


(16.6) PIDE 
and 

Ух 
16.7 EAA pp Um 
(16.7) DN 4 ns 


The symbol © means approximately equal. 
If in (16.6) and (16.7) we substitute for ХХ на and EX za their 
equals ZR н; and ZR; and the two formulas are combined we have 
1 (25 ХЕ.\ _ 
20р ' pn J^ ^ 
Hence an appropriate estimate of the mean for large samples is 


(16.8) Estimate of и = PBe Вы 


Estimate of Mean and Standard Deviation · 419 
Similarly 
" i 
(16.9) Estimate of в = Ny, [ZR n: - ZR] 
When upper and lower halves are used Formulas (16.8) 


and (16.9) yield 


ZEx t ZR; 
N 


ZRg; - ZR; 
N 


(16.10) Estimate of и = 


(16.11) Estimate of o = 1.254 


When upper and lower 2797 are used the estimates are 


(16.12) Estimate of = 1.851 а 


(16.13) Estimate of о = 1.512 72m: — Rus 

The estimates based on items give results which are quite 
accurate if all the items making up the test used as criterion are 
available. If some of the items have been eliminated after a review 
of the item analysis, the remaining item data may still be useful 
for estimation of и and о. The important consideration in deter- 
mining usefulness of the estimates is whether the elimination 
of items has changed the order of test scores to any great extent. 
If many test scores have been shifted from their original grouping, 
high, low or middle, then the estimates will not be satisfactory. 

It is interesting to remark that the estimate of standard 
deviation based on high-low 50% as shown in Formula (16.11) is 
identical with the mean deviation of test scores when data are 
available for all items. The mean deviation is assumed based on 
deviations from the median. 


ZH ж ZR; 
N 


>Хн«— ZX а 
N 


1.254 = 1.254 


Now subtraet the median test score from each score on the right. 
This does not affect the value of the estimate since 


N/2 N/2 N/2 N/2 
Уу Хн. = Ў Хг = » (X na — Х.м) — Ў (Xie — Хы) 
aci aci azı aci 


Also (X na — X) = | X ma — X s0 | 
and —(X ra = Хы) = | Xia -Xal 


420 - Percentiles 


Tx x. 


€ acl 
Hence 1.254 = 1.254 N 


Ён = ZR, 
N 


The expression on the right is an estimate of с based on the 
mean deviation. This estimate is known to have efficiency .88. 
Hence .88 is also the efficiency of the estimate of о in (16.11). 
The estimate of the mean in (16.10) has efficiency one. The 
efficiency of estimates when groups are not upper and lower 
50% is not known to the authors. 

This discussion suggests that the quantity 


Rui = Ёш 
2NYp 


may be called an index of reliability of the ith item corresponding 
to the index of reliability suggested by Gulliksen?. Like that 
index, it has the property that when summed over all items it 
provides an estimate of о. The corresponding index of item 
difficulty is 


(16.14) 


Rui Ryu 
(16.15) ED 


Estimation of the Correlation Coefficient. If the correlation 
coefficient, p of a bivariate normal population is to be estimated, 
the Pearson product moment r is an efficient; estimate. Inefficient 
estimates of p will be described in this section.® 

The following procedure provides an estimate of p which has 
efficiency of .52 when р = 0: 

1. Arrange the cases in order of size on one of the variables, 
say X. Select the 27% of the cases for which X is greatest and 
the 27% for which X is least. Discard the 46% remaining cases. 

2. Find the median Y score of the 54% of cases selected in 
step 1. This median may not be the same as the Y median of 
the entire sample. 

3. Count the cases in each of the four groups and designate 
the number in each group as follows: 
^, above У median of selected cases and in upper 27% of X scores 
^; above У median of selected cases and in lower 27% of X scores 
^» below Y median of selected cases and in lower 27% of X scores 
nı below Y median of selected cases and in upper 27% of X scores 


These four numbers are shown diagrammatically on page 421. 


| 


| 


| 


Estimation of the Correlation Coefficient - 421 


Low 27% 46% High 27% 
gas ^ — enX Total 
Above Y median 
of selected cases fia m т + n = .27N 
Below Y median 
of selected cases die M m + т = 2N 
Total па + m = .27N m + n = .27N т + m +m + nu=.54N 


The estimate of the correlation coefficient is read from 
Chart XV. As а first step in the use of this figure compute Xo. If 
Ni + > nı + M 


Sie Oe 
0T m а + Ma + Ma 


Locate X, on the horizontal scale of Chart XV and draw a vertical 
line through X, to intersect the curve A = 27. Draw a horizontal 
line from this intersection over to the vertical scale. The point 
where this horizontal cuts the vertical scale indicates the required 
estimate of p. Of course one does not actually draw these vertical 
and horizontal lines on the diagram but notes their position by 
using a ruler or the edge of a card. 
If m +73 < л» + т evaluate the ratio 


be Na + 74 
Tay + Na + ma + T 


and proceed as before, but the estimate is now negative. 

Actually, not only 27% but any percentage will serve the 
purpose. However, the estimate is more efficient when 27% is 
used. Chart XV provides a means of estimating p by several 
high-low proportions. In particular, the use of high-low 50% on 
X provides an estimate equivalent to that available from the 
tetrachorie coefficient for the special case when each variable is 
split at the median of the sample. 

The actual procedure can be carried out conveniently by use 
of cards or by a scatter diagram. If cards are available, the 
procedure can be carried out in the manner described above. 
If a scatter diagram is used, the cases are partitioned as in Figure 
16-1. 

From this figure 


Xo 


19 +19 


Ta E ово. 


Xo 


422. Percentiles 


On Chart ХУ a vertical line through the point X = .70 on the 
horizontal axis intersects the line marked .27 in the point with 
vertical coordinate р = .40. Xo = 38/54 = .70 and, therefore, р is 
read from Chart XV as .40. 


Low 27% High 27% 


Fig. 16-1. Scatter diagram of 100 cases 
partitioned for estimation of р. 


REFERENCES 


1. Dixon, W. J., and Massey, F. J. Jr., Introduction to Statistical Analysis, New 
York, 1951, McGraw-Hill Book Co., Inc., Chapter 16. 

- Gulliksen, Harold, Theory of Mental Tests, New York, 1950, John Wiley & Sons. 

. Kelley, Truman L., * The Selection of Upper and Lower Groups for the Valida- 
tion of Test Items,” Journal of Educational Psychology, 30 (1939), 17. 

. Kelley, Truman L., Fundamentals of Statistics, Cambridge, Mass., 1947, Harvard 
University Press. 

. Mosteller, Frederick, “On Some Useful ‘Inefficient’ Statistics,” Annals of 
Mathematical Statistics, 17 (1946), 377-408. 


т A ON 


17 Transformation of Scales 


A great deal of the theory presented in previous chap- 
ters was based on the assumption that the variable under con- 
sideration has a normal distribution. When this assumption 
appears dubious, the variable, or scale, may often be transformed 
into another variable with a distribution approximating the 
normal. The formulas for the variance of certain statistics 
involve the unknown parameters of which those statistics are 

2 
estimates. Examples are о,’ = 59 and ос? = NE! 
a case, it is a great advantage if a related statistic can be found 
with variance dependent only on sample size and not on any 
unknown parameter. Several transformations which serve one 
or both of these purposes will be briefly presented in this chapter. 

The transformation of the correlation coefficient r into 
1+7 
l=?’ 
discussed in Chapter 10 and tabulated in Table XII, is an ex- 
ample of a transformation which achieves both purposes. 
Transformation of Proportions into Angles. Because of the 
unknown parameter in с,? = PQ/N it is often useful to transform 
proportions into angles by the formula 


In such 


hw 
= 5 log 


(17.1) ф= 2 aresin Vp 
This formula may also be written as 
(17.2) ф= 2 sin Vp 


Both formulas may be read ‘‘¢ is twice the angle whose sine is 
vp.” Arcsin and sin™ are discussed in trigonometry texts. 
If ф is expressed in radians its variance is approximately 


(17.3) сї = 1/N 
This formula is valid with only slight inaccuracy when 
05 <P «.95 and N 2 20. 


For large samples the acceptable range of P is even greater. 
The value of ф in radians corresponding to a given p other 


424 - Transformation of Scales 


than 0 or 1 can be read from Appendix Table XIXA. Bartlett м 
has given the formulas 


(17.4) фо = 2 arcsin \/ ES for p=0 


(17.5) and ф = 3.1416 — 2 arcsin V ay fop-1 


The values of ф and $; can be read from Table XIXB, In testing 
the hypothesis P = Ри the statistic 


(17.6) z= VN (2 arcsin Vp — 2 arcsin VP x) 

is useful because its distribution is approximately normal with 
unit variance, A test using this statistic is preferable to the test 
described in Chapter 3 because the distribution of ¢ is more nearly 
normal than that of р. When the observations in an analysis of 
variance problem are proportions, homogeneity of variance can- 
not be assumed because the c;? varies with P and with N. If all 
proportions are based on the same N and if each is transformed to 
an angle homogeneity of variance is secured because each angle 
has the same variance, 1/N, even though the proportions differ. 

The Square Root Transformation. A very skewed distribu- 
tion known as the Poisson Distribution which often arises in 
connection with the frequency of occurrence of a very rare event 
has c? = џ. To make tests of significance concerning the means 
of such distributions the variable (that is the number of occur- 
rences) should be replaced by its square root, 

The Logarithmic Transformation. If the scores in an analysis 
of variance layout are so distributed that the mean of each sub- 
group is approximately equal to its standard deviation, then each 
score should be replaced by its logarithm before the analysis of 
variance is carried out, For example, if the entries are sample 
variances this situation would occur. 

Transformation of Ranks to Normal Deviates. The use of 
ranks in computing a correlation coefficient was described in 
Chapter 11. It often seems reasonable to assume that the under- 
lying variable represented by the ranks is normally distributed. 
Appendix Table XX may be used to transform a set of ranks into 
a set of scores from a normal distribution in which р = 50 and 
@ = 10, for samples in which 5 5 № = 30. To find the normal 
equivalent of a rank R when N > 30, first find the proportion 

В - 0.5 
(17.7) юл 1-79 


Normalization of F Distribution · 425 


then read in Table II the normal deviate z, in a unit normal curve 
corresponding to the proportion p. The corresponding normal 
deviate when д = 50 and ø = 10 is 


Z, = 102, +50 


Normalization of the F Distribution. The F distribution has 
so far been tabulated for only a few significance levels.* When 
a statistician needs to know the significance of a value of F lying 
between these tabulated values, he may take the cube root of the 
observed F and substitute it in a formula given by Paulson ê 


вы (вы) 


9n; 9n, 
The statistic и has a distribution which is nearly unit normal when 
the error variance has at least 3 degrees of freedom. Therefore, 
a good approximation to the significance level for F can be obtained 
by computing u and taking о as the area in the two tails of the 
normal distribution. 

Uniformization. When the form of distribution of a variable 
is not known or is presumed to be not normal, the ordinary tests 
of significance are not justified. "Then the scores may be trans- 
formed to ranks and a non-parametric test based on rank order 
may be performed. 


(17.8) и = 


REFERENCES 


1. Bartlett, M. S., “Some Examples of Statistical Methods of Research in Agricul- 
ture and Applied Biology," Supplement to the Journal of the Royal Statistical 
Society, 4 (1937), 137-183. 

2. Bartlett, М. S., “The Use of Transformations,” Biometrics, 3 (1947), 39-52. 

3. Edwards, Allen, Experimental Design in Psychological Research, New York, 
1950, Rinehart and Co., Inc., pages 198-204. 

4. Eisenhart, C., Hastay, M. W. and Wallis, W. A., Selected Techniques of Statistical 
Analysis, New York, McGraw-Hill Book Co., Inc., 1947. Chapters 7 and 16. 

5. Hotelling Н. and Frankel, L. R., “The Transformation of Statistics to Simplify 
their Distribution," Annals of Mathematical Statistics, 9 (1038), 87-96. 

6. Paulson, Edward, “Ап Approximate Normalization of the Analysis of Variance 
Distribution,” Annals of Mathematical Statistics, 13 (1942), 233-235. 

7. Rao, С. В., Advanced Statistical Methods in Biometric Research, New York, 1952, 
John Wiley and Sons, Inc., pp. 207-214. 


* The forthcoming edition of Pearson's Statistical Tables (Cambridge University 
Press), will contain the table of percentage points of the F distribution by Merrington 
and Thompson. This tabulation, most of which appeared in Biometrika 33 (1943), is 
the most extensive yet made. 


| 8 Non-Parametric Methods * 


Most of the methods presented in previous chapters depend 
upon the assumption that samples have been drawn from a normal 
population. In many problems, this assumption looks quite un- 
reasonable. In this chapter several methods will be described for 
making inferences without any assumption as to the form of dis- 
tribution in the population. Such statistical methods are called 
non-parametric or distribution-free. Examples of non-parametric 
methods which have already been studied are the x? test, the per- 
centiles, and the rank-correlation coefficient. 

Since the methods to be presented are valid for any parent 
population (or, in some cases, for any parent population on a con- 
tinuous variable), they could be validly applied to samples from 
normal populations. Almost always this would be unwise because 
of the loss of efficiency, as this concept was discussed on page 414. 
The tests based on normality assumptions are in general best 
possible tests for samples from a normal population and the use 
of other tests is disadvantageous. 

If the samples are not from a normal population and we use a 
non-parametric test with level of significance о, then for any parent 
population whatever the probability of an error of the first kind 
is actually equal to o or less than а because of discreteness. Оп 
the other hand, using a normal theory test with level of significance 
o under the same circumstances does not assure that the probability 
of an error of the first kind is controlled at level a. That prob- 
ability may be greater or less than а, depending upon the form of 
the parent population; generally we have no way of knowing the 
direction or degree of departure from a. 


A. Tests for Comparison of Two Samples 


Kolmogorov-Smirnov Test. A two-sample test ! which is 
sensitive to any kind of difference in the distributions from which 


* This chapter was written by Professor Lincoln Moses of Stanford University. 
Because of limitations of space, the manuscript was slightly changed after it left his 
hands. For any errors which may have been caused by such abridgment the authors 
take full responsibility. 


Kolmogorov-Smirnov Test · 427 


the samples are drawn, is one based on the sample cumulative 
percentage polygons. If two samples have, in fact, been drawn 
from populations with the same continuous distribution, then both 
cumulative percentage distributions should resemble the parent 
distribution, and should resemble each other. If the distributions 
are too far apart at any point it is cause for rejecting the null 
hypothesis. The test statistic is D = the maximum vertical differ- 
ence between the two polygons. 


TABLE 18.1 Significance Points for Maximum Difference between Two Sample 
Cumulative Distribution Functions (№, = № < 40) * 


D=k/N 
N Values of & for 
а < 05 а < 03 as .01 

10 or 11 6 7 

12 6 y 

13 6 7 8 
14 or 15 1 8 
16 ог 17 7 8 9 
18 ог 19 8 9 
20 — 22 8 9 10 

23 9 10 

24 9 11 
25 — 27 9 10 il 
28 or 29 10 12 
30 — 32 10 12 
33 or 34 11 
35 — 39 11 12 

40 12 


* Adapted from Massey !? 


For N, = М, < 40 values of ND calling for rejection at various 
significance levels are given in Table 18.1. For larger values of Ni 
and №, whether or not they are equal, rejection values can be 
caleulated from the formulas in Table 18.2. 

It is important to note that the scale on which the observations 
are made need not have a zero point or even have equal intervals 
in order for this test procedure to be valid; it is only necessary 
that the order of size of observations be reflected in the scale of 
measurement. The reader can see that this statement is reasonable 
by observing that if the horizontal scale along which the scores 
are measured be stretched in some parts, and compressed in others 


428 - Non-Parametric Methods 


TABLE 18.2 Significance Points for Maximum Difference Between Two Sample 
Cumulative Distribution Functions (Ni, № > 40) * 


а Value of Р so large as to call for 
rejection at level a 


Ni № 

d 1224/ ММ, 
\ /Ni+ № 

05 1.36 NN: 
4 /Ni + Na 

025 148 ММ» 


Ni + № 
01 16/077 
Ni М 
.005 178 үу 


* From Smirnov її 


the maximum vertical distances between the polygons will remain 
unchanged. 

Run Test Suppose we have a sample of N, observations 
which may be called X and a sample of № observations which 
may be called Y, and wish to test the null hypothesis that the 
populations from which these samples come have the same dis- 
tribution. The run test has the property that if the hypothesis 
tested at level o is true, then the probability of rejecting that 
hypothesis is œ no matter what the form of the common population 
may be. If on the other hand, the two population distributions 
differ in any way whatever, then the probability of rejection tends 
to one as the two sample sizes are increased without limit. 

The Ni + № observations should be arranged in order of in- 
creasing size. A run in the observations so ordered is defined as 
a sequence of letters of the same kind which cannot be extended 
by incorporating an adjacent observation. Thus in the 21 ob- 
servations below (arranged in increasing order) there are 10 runs: 


XXX, У, X, Y;Y;YiY; ENG Ya XY eX YOU XN YYY i 


The X runs are underlined; the Y runs are overscored. If the 
two samples are from a common population the X’s and Y’s will 
generally be well mixed and the number of runs will be large. If 


Run Test · 429 


the X population has a much higher median than the Y popula- 
tion, a long run of Y’s is to be expected at the left end, a long run 
of X’s at the right end, and consequently, a reduced total number 
of runs. If the X's come from a population with much greater dis- 
persion, there should be a long run of X’s at each end, and conse- 
quently a reduced total number of runs. Generally, then, rejec- 
tion of the null hypothesis will be indicated if the runs are too few 
in number. Let v represent the number of runs in the sample. 
If М, > 10 and №, > 10, the test of significance is obtained by 
taking v to be normally distributed with mean 


(18.1) By = EM +1, and variance 
1 2 
(18.2) gus 2N1N2(2NiN2 — Ni — №) 


"C м+м м + N2 – 1) 
In the example given, v = 10 with М, = 11 and N: = 10. 


5 2(11)(10) _ 
o = 1+ отн ae 11.48 


./2anaomündo = п = 10} _ уб 9.03 


bU (11 + 10 (11 + 10 — 1) 
Jute Inr] ARs f 
and z= c от САН .66 


The null hypothesis is not rejected and the two samples may be 
regarded as coming from a common population. 

Since the test is to be used for continuous variables no ties 
should appear; if there are a few ties they should be broken at 
random. Suppose for example that three X's and two Y's are tied 
at a common value. To determine where the two values of Y 
are to be placed in this list, enter a table of random numbers, and 
read down a column until you reach one of the digits 1, 2, 3, 4, or 5. 
Suppose the digit 3 is reached first and then the digit 1. Then 
one Y will take position 3 and the other take position 1. The X 
values will take the remaining positions. Then the set of 5 tied 
observations is rewritten as Y X Y X X. If any tied set consists 
only of X's or only of Y's there is no need to break the tie. An 
alternative procedure is to break the ties in all possible ways and 
for each such way to calculate the number of runs and the asso- 
ciated probability, and then to take the average of these proba- 
bilities for the probability associated with the sample. 


430 - Non-Parametric Methods 


The run test may be used either where the original data are 
ranks or where the original data have been recorded as measure- 
ments and reduced to ranks in order to perform the test. 

It is important to note that the run test, being sensitive to any 
kind of difference in the two populations is not a very powerful 
test of difference in location, that is, a situation in which one 
population tends to have higher scores or higher ranks than the 
other. If the investigator wishes to test against a particular kind 
of alternative (such as difference of means) he should usually use 
a test designed for this alternative rather than the run test. 
Several such tests will now be presented. 

The Sign Test. The sign test is one of the easiest and most 
widely applicable of statistical tests. It was used in Chapter 1 
of this text. An even number of subjects, say 2N, is divided into 
N pairs, the two members of each pair being as similar as possible 
to each other with respect to certain extraneous variables which are 
not the subject of investigation, but which may affect the observa- 
tions. In each pair of subjects, the choice of which member receives 
experimental condition A is determined by some random device. 

If it seems reasonable to assume that differences are normally 
and independently distributed with common variance and that 
differences are measured on a scale in which intervals are equal, 
the mean and standard deviation of the N differences should be 
computed and a t-test used to test the hypothesis that the mean 
difference is zero (as was done on page 152). When the investigator 
cannot make, or prefers to avoid, these assumptions, he may em- 
ploy the sign test. It tests the hypothesis that the median of the 
population of differences is zero and makes no assumption about 
the form of distribution of these differences. 

The appropriate statistic is the number of differences which 
are positive (or the number negative). If the null hypothesis is 
true, a sample will usually show about $ N positive and about 4 № 
negative differences. Clearly the sampling distribution of the 
ome т = Np is the binomial distribution (Q + P)" with 

=Р=.5б: 

For samples of N < 25 the required probability can be read 
directly from Table IVB. Suppose d is the number of differences 
of one sign and d< 3 N. For instance suppose there are 5 nega- 
tive and 14 positive differences in a sample of 19 pairs. In row 
М = 19 and column m = 5 of Table IVB the entry is.032. Fora 
two-tailed test, the result would be significant at a = 2(.032) — .064. 


Extension of the Sign Test - 431 


For larger samples, an approximation to the exact probabilities 
can be obtained by computing x? with one degree of freedom 
(corrected for continuity). Both theoretical frequencies are 
F;=4N. The values of f; are the sample frequencies shifted by .5 
toward $ N. If this method had been used for the problem of the 
preceding paragraph we should have shifted the observed fre- 
quencies from 5 and 14 to 5.5 and 13.5; computed x? = 3.368 as 
indicated; and found z = V3.368 = 1.84. Referring to a normal 


G= 
f F F 
5.5 9.5 1.684 
1.684 
13.5 9.5 3368 — 52 


probability table we find that for a two-sided test, this is signifi- 
cant at а = .066 which 13 a good approximation to the exact value 
previously found. In the case of a one-sided hypothesis it is neces- 
sary to remember that rejection of the null hypothesis at level a 
is in order only when x? > х. and the more frequent sign is 
opposite to the hypothesized direction. A procedure equivalent 
to computing x? corrected for continuity is to compute 
2m +1 
VN Xm 

and to regard it as a normal deviate. The 1 is added or subtracted 
in such a way as to change 2m to a value nearer N. Thus for 
m = 5 and N = 19 we should have 


20) *1. тд = 2,504 — 4.359 = — 1.835 
v19 
as before. 

The assumptions underlying the test are that the differences 
are continuously distributed and independent. There is no as- 
sumption as to the form of the distribution. In particular nothing 
analogous to homogeneity of variance is assumed. The sign test 
is obviously well suited to data for which measurement is difficult 
but judgment between a pair of objects is possible. 

Extension of the Sign Test. If the data are measurements on 
a scale of equal intervals, certain more general hypotheses may be 
tested. For example by subtracting a constant C from every 
difference, that is by taking d'; = X4, — Хв, — C we can test the 
null hypothesis that the median difference X, — Xz in the popu- 


432 - Non-Parametric Methods 


lation is at least C. The hypothesis is rejected for too many d’; 
positive. 

If the data are measurements on a scale of equal intervals and 
if a zero point exists, then by considering the differences 


d": = Ха, — kXz, 
we can test the hypothesis 
H : P(X4 > kXy) < .5 


rejecting it for too many d"; positive. This device can be used 
only where X is necessarily always positive. 

As these tests are intended to be used with continuous vari- 
ables, ties should occur only rarely as a result of approximation 
in measurement. If a few differences are zero, such pairs are 
omitted from the sample, since they can give no information for 
comparing the two experimental conditions, 

Signed Rank Test for Paired Observations. Like the sign 
test, this test can be used when observations are obtained in 
matched pairs. The hypothesis tested is that the differences be- 
tween observations are symmetrically distributed around a mean 
of zero. As before 2N subjects are divided into № pairs, the two 
members of each pair being matched as nearly as possible. In 
each pair the choice of which member receives experimental 
condition A is made at random. 

The difference between the score under treatment A and the 
score under treatment B in the ith pair will be denoted 


die Xa eS 


To perform the test rank the d; in increasing order of absolute 
magnitude. (Thus in the example which follows the difference 
5.4 — 5.3 = .1 is given rank one because in absolute value it is 
smaller than any of the other differences). Each rank is then 
suffixed by the sign of the difference from which it arose. The 
sum of all the ranks with plus signs will for convenience be called 
the positive rank sum; the sum of all the ranks with negative 
signs will be called the negative rank sum. Under the null hy- 
pothesis these two rank sums should be about equal. If most of 
the differences 4; are positive, and the negative ones are small, 
then the negative rank sum will be small and rejection of the null 
hypothesis in favor of E(d) > 0 is invited. A two-sided test 
requires rejection if either rank sum is too small. Table 18.3 gives 
approximate two-sided significance points for М = 25. If more 


Signed Rank Test for Paired Observations - 433 


TABLE 18.3 Significance Points for the Absolute Value of the Smaller Sum of 
Signed Ranks (T) Obtained from Paired Observations * 


N a= 05 а = .02 а = .01 
6 0 - wu 
7 2 0 - 
8 4 2 0 
9 6 3 2 

10 8 5 3 

11 11 7 5 

12 14 10 7 

13 17 13 10 

14 21 16 13 

15 25 20 16 

16 30 24 20 

17 35 28 23 

18 40 33 28 

19 46 38 32 

20 52 43 38 

21 59 49 43 

22 66 56 49 

23 73 62 55 

24 81 69 61 

25 89 77 68 


+ From Wilcoxon ® by permission of the author and the 
American Cyanamid Company 


than 25 pairs are involved, then 7, the absolute value of the smaller 
rank sum is approximately normally distributed with 


(183) pr = МАО 
ова) аа p= y NE 


The following example will serve to illustrate the computations 
for the signed rank test: 
Pair Xa Xs а= Ха – Хь Rankofd Signed rank 


1 TOL 7.3 3 3 3 
2 6.3 5.7 5 4 4 
3 10.3 10.5 — 2 2 2- 
4 62 47 1.5 8 8 
5 54 53 1 1 1 
6 93 89 Li 6 6 
f 100 91 9 5 5 
8 84 7.0 1.4 7 7 


Positive rank sum = 84; Negative rank sum = 2; Smaller rank sum is T = 2. 


434 - Non-Parametric Methods 


From Table 18.3 we find that a T of 4 or less is significant at 
a = .05 for the two-sided hypothesis E(d) = 0. If our hypothesis 
was E(d) € 0 then a negative rank total of 4 or less is significant 
at a = .025 (approximately). 

Sum of Ranks. The statistical procedure to be described 
below can be used to test the hypothesis that the №, values of X 
and the N: values of Y are samples from a common population. 
It is sometimes called a test of difference in location. It guards 
against the one-sided alternatives P(X > Y) < зог P(X > Y)» +, 
or against the two-sided alternative P(X > У) z 1. 

Consider the two samples shown here, with N, = 9 and N; = 8. 


Group I Group IL 
Score Rank Score Rank 
X Y 

11.5 3 15.2 7 
12.6 5 8.6 1 
19.4 13 9.3 2 
21.3 14 14.4 6 
32.5 17 15.6 8 
18.6 12 11.8 4 
17.0 10 16.3 9 
23.4 15 17.8 11 
29.6 _16 n 

Rı = 105 R: = 48 


As a first step, ranks from least to greatest have been assigned to 
the entire N = N, + М, = 17 scores and the sum of the ranks 
obtained for each group separately. The sum of ranks for group 
7 will be denoted R;. As a check, it should be noted that Ri + Re 


N(N + 1) 
“кота 


must equal In this case 


N(N +1) 17а 
NODIS _ 153 and 105 +48 = 158. 
If N, and N, are each as large as 8, or larger, the statistic 
(18.5) 28; — Ni(N +1) 
{ДЕЛ +1) 
Б 


has a distribution which is approximately unit normal. If В; is 
taken from the first sample we have 
zu 2(105) — 9(18) x 210 — 162 _ 48 
vaw v432 20.8 
3 


= 2.31. 


Median Test for К Samples · 435 
If В, is taken from the second sample we have 
_ 2(48) — 8(18) 96-144 48 


Е MEOS vam "308 О 
3 


For a two-sided test, this would be significant at 
a = 2(.0104) = .021, 


and for a one-sided test, at level a = .0104. If the null hypothesis 
states that the X-median is not greater than the Y-median we 
should reject only when В» is significantly large (that is, Ry is 
small), and vice versa. 

If the number in either sample is small, a table * may be used 
to obtain exact probabilities. 

This test was proposed by Wilcoxon * in the situation in which 
№, = №, and exact probabilities given for small samples. It has 
since been derived by several other people and is usually known as 
the Mann-Whitney Test.” 

Median Test for Two Samples. An even simpler test than that 
based on ranks is arrived at by merely classifying all scores as 
being above or not above the median of the combined samples. 
(When N = №, + № is an odd number опе of the scores is the 
median, and this median score will fall into the class of scores 
“not above the median.") A contingency table is now set up and 
x? computed, or if the number of cases is small the exact proba- 
bility may be computed by the method described on page 102. 


Group I Group II "Total 
Above median 7 1 8 
Not above median 2 7 9 
9 8 17 


The data of the preceding paragraph with 16.3 as median of the 
combined sample produce the contingency table shown here. 
For samples as small as this, Fisher’s exact method described on 
page 103 must be used. For large enough frequencies x? corrected 
for continuity is the test statistic. 


B. Comparison of k Samples 


Median Test for & Samples. This test is a natural extension 
of the two-sample test described in the preceding paragraph. 


436 - Non-Parametric Methods 


Suppose that 28 subjects are divided at random into 5 groups 
A, B, C, D, and E, and each group is afforded a different motivat- 
ing rationale for excelling in a maze test. They are then given 
the same maze test and the time required for completion is re- 
corded. Let the time in seconds be: 


A B с D E 
18.1 16.7 247 18.2 12.4 
24.0 174 36.5 25.9 18.8 
31.7 22.4 42.1 27.0 19.3 
32.3 27.1 43.2 36.6 22.5 
35.5 35.8 48.7 37.6 35.1 
46.2 50.4 39.8 


The grand median is found to lie between 27.1 and 31.7, therefore 
4 observations in group A are greater than this median, which we 
call Xs. In group B only one observation is greater than X зо. 
Continuing in this way we arrive at the following summary: 


АМЕ BA OS Deus 


Above X. 4 1 5 3 1 14 
Below X. 2 4 1 3 4 14 
Ni 6 5 6 6 5 


The data exhibit some discrepancy from what might be expected 
under the null hypothesis. To assess whether the discrepancy 
is Significant we may treat the data as a contingency table and 
compute x’ from the frequencies (which are large enough in this 
example to justify that procedure). The value of x? with 4 de- 
grees of freedom is 6.93 which is not large enough to be significant 
even at level a = .10; so the null hypothesis is not rejected. 

If N, the number of observations in all k samples taken to- 
gether, is odd then one of the observations will be X 4. Consider 
that observation as being below the median. 

This test is based upon the assumption that all populations 
have the same form of distribution. It is believed, however, that 
the test is not sensitive to differences in population form. 

Sum of Ranks for Comparing k Samples. This extension of 
the two sample comparison by ranks to problems in which there 
are k samples is due to Kruskal and Wallis? The samples are 
merged and a rank of 1 assigned to the lowest score, 2 to the next 
lowest, and so on. Then the sum of the ranks is found for each 
of the 5 samples. Let №; be the number of observations in the 


Sum of Ranks for Comparing К Samples · 437 

ith sample; В; the sum of the ranks assigned to the observations 
k 

in that sample; and N = » N;. The test statistic to be computed 
1 


if there are no ties is 


k 
(18.6) H- NU Di - 3(N +1) 
If there are ties in rank because two or more observations are 
equal, each observation is given the mean of the ranks which those 
observations would have received had there been no ties, and H 
is divided by 

zT 

(18.7) 1- NW? = 1) 


If tis the number of observations in one tied set, T = (t — 1)t(t + 1) 
for that set. ET is taken over all groups of ties. 

If the k samples come from identical populations, and the N’s 
are not very small, H is distributed approximately as x* with 


TABLE 18.4 Computation of Analysis of Variance by Ranks (One Criterion 
of Classification) 


Rank of Individual 


A B С р Е Я 
Score Rank Score Rank Score Rank Score Rank Score Rank 


18.1 4 16.7 2 24.7 11 18.2 5 124 1 
24.0 10 17.4 3 36.5 20 259 12 18.8 6 
31.7 15 224 8 42.1 24 20 13 19.8 1 
32.3 16 їл 14 43.2 25 36.6 21 22.5 9 
35.5 18 35.8 19 48.7 27 376 22 35.1 17 


462 20 _ 54 28 398 28 Pet 
Ri 89 46 135 96 40 
М; 6 5 6 6 5 
Мо maby | A a АО eS ee 


k 
1 
A check is provided by the relation Y R; = NOTED 
1 
89 + 46 + 135 + 96 + 40 = 406 
and 2509) - 406 


12 (80, 4e, DP 99.97) 309 = 11 
zu aos %- psp ptu S09 


and for 4 degrees of freedom x? 975 = 11.1 


438 - Non-Parametric Methods 


k—1 degrees of freedom. The hypothesis is rejected for large 
values of H. 

By way of illustration let ranks be assigned to the data of the 
preceding section for the times required for running a maze under 
the various motivations A, B, C, D, and E. The ranks and the 
sum of the ranks for the five groups would be as in Table 18.4. 

'This result differs from the non-significant result obtained 
with the median test. This is not particularly surprising. The 
present procedure employing ranks utilizes the magnitudes of the 
observations more fully and so may be expected to be somewhat 
more sensitive than a procedure which merely classifies observa- 
tions as above or not above the median. 

Analysis of Variance by Ranks. The analysis of variance by 
ranks is а very easy procedure and does not depend on assumptions 
of normality or of homogeneity of variance. It has the further 
advantage of enabling data which are inherently only ranks to be 
examined for significance. If the Wallis-Kruskal H test is parallel 
to the one-way analysis of variance, then a test due to Friedmann 
is parallel to randomized blocks.* 

Suppose each of m subjects is measured under each of p situa- 
tions (or p treatments); or each of m groups takes each of p tests. 
In such situations, it is often not reasonable to assume homogeneity 
of variance. 


TABLE 18.5 Analysis of Variance with Ranked Data. Two-way Classification 


—— ет УУ RUE ННАЦ 
Subject Score under Treatment Rank of Treatment 


A VBE СО АТВ CODE 

1 и 14 13 9 20 4 2 3 5 1 

2 12 SUIS SOS, 3 quA 5 1 

3 Wy dha З 6 25 3 2 4 5 1 

4 9: Ey йб 5 3 2 4 1 

T; = Total 1б CS ug 4 


The p treatments are ranked for each subject, and the ranks 
summed for all subjects as in Table 18.5. The correctness of the 


totals may be checked by noting that their sum must be шы). 


5 . 4 
which for Table 18.5 is 406 tn = 60. If the treatments are 


not very different, the rank totals may be expected to turn ont 


Analysis of Variance by Ranks - 439 


about equal. In the present example there seems to be a marked 
disparity. To evaluate its significance we compute the statistic 


12 
a 
mp(p + 1) 


which has approximately the x? distribution with p — 1 degrees 
of freedom.  x?, is identically equal to m(p — 1)W where W is the 
coefficient of concordance discussed on page 284. For the data of 
Table 18.5, т = 4, p = 5 and 


(18.8) = ?, — 8m(p + 1) 


12 
2 = (15% + 11° + 11? + 19? + 42) — 3(4)(6 
о +4) - 3(4)(6) 
= 84.4 — 72 = 12.4 


For 4 degrees of freedom this is significant at the .01 level. 

Actually the values of m and p in this example are so small that 
the x? distribution evaluates the significance somewhat roughly. 
In fact, the 1% point for the test statistic with m = 4, p = 5, is 
10.93; thus the use of the x? table here introduces a certain dis- 
tortion. Generally, if p > 7 and m = 6 the x? approximation will 
be quite adequate. For values below this a set of data may be 
more significant than the test indicates. If the sample sizes are 
such that 3p + т = 20, whether or not they are large enough for 
the x? approximation to serve, another approximation serves well 
to evaluate the significance of the statistic x°. We may take 


(18.9) F- 
to be distributed as F with 
(18.10) ж=р-1- 2 degrees of freedom for the numerator 
(18.11) and 

та = (m — 1) (» -1- 2) degrees of freedom for the denominator 


These degrees of freedom are not integers and interpolation in the 
F table may be necessary. In the foregoing example 


3024) 372. 
= ушм зс 109 


m = 3(3.5) = 10.5 


440 - Non-Parametric Methods 


In this example, as in many cases, interpolation is not necessary, 
since the obtained F is larger than F» for m = 3 and n; = 10, and 
is of necessity larger than F s for n; = 3.5 and m; = 10.5. 


C. Confidence Intervals 


Confidence Interval for the Median. In Chapter 16 a formula 
for the standard error of the median was given. This standard 
error may be used to determine a confidence interval for the median 
when the form of population distribution can be assumed as known. 
We shall now describe a method for computing a confidence inter- 
val for the median when no assumption can be made about the 
form of parent population. 

The confidence interval can be developed from the simple 
notion that a random observation is as likely to exceed the popu- 
lation median as to be less than that median. All observations in 
a continuous population may be considered as falling in one of two 
classes (smaller than median, not smaller) with a probability of .5 
for each class. For a sample of observations from any population, 
the number of observations above (or below) the median will have 
the binomial distribution described in Chapter 2, with Р = .5. 

Suppose for simplicity that a sample of two observations is to 
be drawn, the smaller being denoted X;, and the larger Xs. Then 
the probability that both X; and X; will be less than the popula- 
tion median is (.5)(.5) = .25. Also the probability that both obser- 
vations will exceed the median is .25. The remaining alternative 
is that one observation will exceed and the other be less than the 
median, and this situation has probability .5. In other words, the 
probability that two random observations will include the popula- 
tion median between them is .50. 

The corresponding computation for a larger number of cases 
will be described in relation to the 11 archery scores of Table 9.8, 
in the column headed First Order. Arranged in ascending order 
these scores are 33, 51, 66, 71, 78, 80, 99, 105, 114, 134, 146. 

Let the observations in ascending order be denoted 


о. ае ои 


The probability that m observations in a sample of 11 cases will be 
smaller than the population median & 50 is given by the appropriate 
entry in row N = 11 of Table IVB. In the column headed 1, the 
entry .006 is the probability that 0 or 1 observation but not more 
than 1 will be less than £s. Similarly in the column 9 the entry 


Kolmogorov-Smirnov Confidence Band - 441 


.994 is the probability that 9 or fewer observations will be less than 
2. Hence for any sample of 11 cases 


Р(Х» < Es < Xy) = .994 — .006 = .988 


Applied to the 11 observed archery scores, this probability becomes 
the confidence interval 


C(51 < & < 134) = .988 
The reader may verify by similar methods that 


Р(Х: < £s < Xo | N = 11) = .934 
Р(Х, < Ex < Xs | = 11) = .774 


and for the 11 archery scores 


C(66 < £x < 114) = .934 
C(71 < £s < 105) = .774 


Table IVB is useful for samples of 25 or fewer. For larger samples 
we use the normal approximation to the binomial with mean 
= NP = .5N and standard deviation VNPQ = .5VN. We take 
the integer next larger than 213. (.5УЛ) and count that many ob- 
servations to the left and to the right from the sample median. 
The observations thus reached are the limits of a confidence inter- 
val with coefficient 1 — о or larger. Thus if а = .05 and М = 36 
the sample median lies between Хз and Xi, and 


Ate (БУМ) = 5.88 


Counting off 6 observations each way from the median produces 
the interval Xi; < £s < X». 

Kolmogorov-Smirnov Confidence Band for Cumulative Fre- 
quency. From the cumulative percentage distribution of a sample 
this method enables one to draw a confidence band for the cumu- 
lative percentage distribution of the population. For example 
consider a sample of 10 observations: 


9.1, 10.3, 11.6, 12.4, 12.5, 13.0, 13.6, 14.2, 16.1, and 18.7 


The cumulative percentage distribution for these is shown by the 
solid line in Figure 18-1. In order to construct a band such that 
the cumulative population distribution can with confidence 1 — a 
be asserted to lie within it, we enter Table 18.5 with N — number 
of eases in the sample and obtain a value of МР. In the row 
N =10 and column а = .01 we find ND, = 5 or D, = .5. Then one 
dotted line is drawn parallel to the sample line in Figure 18-1 and 
0.5 above it; a second dotted line is drawn parallel to the sample 


442 - Non-Parametric Methods 


1.0 


i 


Cumulative Relative Frequency 


оњ Ri to hw DY & 


OO ee 13 16€ 115,216, М 18 19 
Scale of Scores 


Fie. 18-1 


line and 0.5 below it. We may now assert with confidence .99 that 
the cumulative percentage distribution for the population from 
which our sample was drawn lies inside the band bounded by these 
dotted lines. 


TABLE 18.5 Values of ND..* Da = Smallest absolute discrepancy between an 
observed and hypothetical frequency distribution which will cause rejection of 
hypothesis at significance level a 


————————————————— 
N а < .05 «<03 asl N a's .05 œS 03 asl 


8 4 5 41 9 10 1 

9 4 5 42-47 10 11 
10 5 48 11 12 
11-12 5 6 49 10 12 
18-14 5 6 50-56 10 11 12 
15-18 6 7 57-59 11 13 
19 6 "i 60-65 11 12 13 
20-21 6 7 8 66-67 11 12 14 
22-24 7 8 68-70 12 14 
25 7 8 71-76 12 13 14 
26-28 7 8 9 84-87 13 14 15 
29-31 8 9 88-91 13 14 16 
32 8 9 92-94 13 14 
33-36 8 9 10 95 14 
37-40 9 10 96-100 14 15 


* Abstracted from Birnbaum! 


The same method affords a test of goodness of fit. If the cumu- 
lative distribution function specified by some hypothesis about the 


Confidence Interval for и; — u, · 443 


population lies entirely within the confidence band constructed 
with confidence coefficient 1 — a around the sample distribution, 
then the hypothesis is not rejected. If at any point the hypotheti- 
cal curve lies outside the band, the hypothesis is rejected at level a. 

Comparison of Kolmogorov-Smirnov Method with x*. The x? 
goodness of fit test, which is also non-parametric, is applicable to 
either continuous or discrete variables; the Kolmogorov-Smirnov 
test may be applied only to continuous variables. The x? test can 
be applied where only the form of the distribution is hypothesized 
and some parameters remain to be estimated from the sample; 
the K-S test can be used only where the hypothesis specifies the 
hypothetical distribution completely, giving the form and the 
numerical values of all parameters of the distribution. The K-S 
test can be used as in the illustration, with samples too small to 
justify the x? test. The K-S test requires less computation than 
ihe x? test. Finally, where the Kolmogorov-Smirnov test is appli- 
cable, there are good reasons to believe it is a more sensitive test 
than the x? test. 

Confidence Interval for и. — uy. If the distributions of two 
populations ean be assumed to have the same shape, differing 
only by translation, then a confidence interval for the amount of 
translation, that is и. — м,» can be constructed by graphical 
methods. 

Let the scores in one sample be denoted Xi, Хз, . . . Хм and 
in the other Yı, Yo... Ух. It is convenient though not neces- 
sary to have all scores positive with the smallest not much greater 
than zero. To make this transformation, add a constant (positive 
or negative as may be necessary) to each of the №, + N» scores. 
On graph paper, plot the new X-values along the horizontal and 
the new Y-values along the vertical axis. Place a dot at the point 
determined by each (X, Y) pair of values. There will be NıN: 
Such points. 

If a 45° line were drawn through the origin, all points to the 
right of that line would have X > Y and all points to the left have 
Y > X. Instead of a 45? line through the origin, two 45° lines will 
be drawn so that a predetermined number of points will lie outside 
them. The X-values of the intersections of these lines with the 
horizontal axis will provide the limits of the confidence interval. 
If М, and № are both 8 or more and o is not small (e.g., .01 with 
Ni, №, = 80r.001 with Ni, № = 15) the number of points outside 
the pair of lines on each side is 


444 - Non-Parametric Methods 


(18.12) U мм + =, 


After U has been computed, take a 45° drawing triangle апа соп- 
struct a line at an angle of 45° to the axes, having U of the (X,Y) 
points above it and passing through the (U + 1)st point, counting 
from the left. Construct a second 45° line having U of the plotted 
points below it and passing through the (U + 1)st point counting 
from the right. The intersections of these lines with the horizontal 
(or with the vertical) axis give the confidence interval. 

As an example, consider the data previously used in the dis- 

cussion of sum of ranks with М, = 9, and М, = 8. 

X: 11.5, 12.6, 19.4, 21.8, 32.5, 18.6, 17.0, 23.4, and 29.6 

У: 15.2, 8.6, 9.3, 14.4, 15.6, 11.8, 16.3, 17.8 
We shall find the 95% confidence interval and also for purposes 
of illustration the 99.9% interval for и. — by. 

The first step is to subtract 8 from each observation; to plot 
the resulting values of X on the horizontal axis and Y on the verti- 
cal and to mark the 72 (X,Y) points. The purpose of subtracting 
8 is merely to produce a more compact chart, with plotted points 
nearer to the origin. The second step is to compute 


i= 2 (80) d үү е 58 


On the diagram two lines are drawn each excluding 15 points and 
passing through the 16th. (The line at the right passes through 
both the 16th and 17th points.) These are the solid lines in 
Figure 18-2. They intersect the horizontal axis at 1.4 and 10.0, 
so we write 


C{1.4 < us py € 10.0} = .95 
If а is very small Formula (18.12) is inadequate. Then we 
read from White’s tables?! the sum of ranks R which is significantly 
small at level о. From this value of R, the number of points to be 
excluded by each 45° line is computed as 


(18.13) Ив S ru) 


where М; is the number of cases in the smaller sample. For an 
interval with confidence coefficient .999, œ = .001 and White’s 
table shows R = 40 if N, = 9 and М, = 8. Then U = 40 — 36 = 4. 


Confidence Interval for E(d) · 445 


The dashed lines in Figure 18-2 correspond to this value. The 
line at the left excludes four points and passes through the fifth 
and sixth. The lower line excludes four points and passes through 
the fifth. In this case the lower line intersects the z-axis at 14.7, 
and the upper line intersects the y-axis at 3.7; it is clear that this 
line if extended intersects the x-axis at — 3.7. We therefore write: 


999 = C(— 3.9 = р. — py € 14.7} 


The justification of this procedure rests on the Mann-Whitney 
test which is described on page 434. 


Scale of Y-8 
ru ur OS 


123456 7 8 9 1011 12 13 14 15 16 17 18 
Scale of X-8 


Ба. 18-2 


Graphic Method of Obtaining Confidence Interval for E(d). 
This convenient procedure devised by John Tukey (unpublished 
paper) is related to the signed rank test for paired comparisons 
described on page 432. The underlying assumption is that the dis- 
tribution of X4, and of Хь, differ only by translation. 

The differences d; = X4, — X5, (taken with regard to sign) are 
denoted by heavy dots on a vertical scale as in Figure 18-3, the 
top point being here marked B and the bottom C. A point midway 
between B and C is found and marked A. On the horizontal line 
through A a point D is marked at some convenient distance. The 
line segments BD and CD form an isosceles triangle. Through 
each dot on the line BC one line is drawn parallel to CD and an- 
other parallel to BD. Each intersection (including those on the 
vertical scale) is marked with a heavy dot. The number of such 
dots is N(N + 1)/2. 

This construction is simplified either by use of triangular co- 
ordinate paper or by choosing D with the aid of a draftsman’s 
triangle in such a way as to facilitate constructing the parallel 
lines with the same triangle. 


446 - Non-Parametric Methods 


The confidence interval will be an interval on the vertical 
scale. To find it, read from Table 18.3, T,, the largest significant 
value for Т, which is the absolute value of the smaller rank sum. 
To this value add 1. In the example on page 433 which has been 
used for Figure 18-3, T, = 4, and T, +1 = 5. Find the (Т, + 1)st 
dot (which in Figure 18-3 is the 5th) from the top and draw a 
horizontal line through it. Indicate the intersection of this hori- 


Fie. 18-3 


zontal line with the vertical scale as U.* Now find the (T, + 1)st 
dot from the bottom, draw a horizontal line through it, and indicate 
the intersection of this line with the vertical as L. The points U 
and L are then the endpoints of the confidence interval. In 
Figure 18-3 these points have scale values 1.25 and .15. There- 
fore we can write 


С(.15 < E(d) < 1.25) = 1 – .05 = .95 


D. Tests of Independence 


Where each observation consists of a pair of values — say a 
measurement of intelligence and one of scholastic achievement, we 
may ask, “Are the two qualities interrelated, or independent of 
one another?” The student is well acquainted with numerous 
attacks on this problem. If the population of measurements on 
the two variables is bivariate normal then the product moment 
correlation coefficient, r, and the tests of significance discussed in 
Chapter 10 are appropriate. 


* Upper horizontal line has been incorrectly drawn through 4th instead of 5th point. 


Corner Test of Association · 447 


І less is known about the form of the parent population, or if 
the population is known not to be bivariate normal, the scores 
may be replaced by their ranks and Spearman's rank correlation 
coefficient, Rs, or Kendall's т (page 286) may be calculated and 
tested for significance. With fairly large samples confidence inter- 
vals may be obtained for r. Finally, x? provides a test of inde- 
pendence. This test may be readily applied to a bivariate dis- 
tribution by dividing the scatter-diagram into 4 quadrants by 
lines drawn through the medians of the two variables. The x? 
test is then applied to the numbers of scores in the quadrants. The 
data need not even be of the sort which permit ranking, but may 
be strictly categorical. For example, the characteristics “ Favor- 
ite sport" and “Occupation” may or may not be related. Any 
statistical test of their independence will have to be one which 
does not rely on order. These three methods (Rs, т, and x?) all 
provide well-known distribution-free tests of the null hypothesis 
of independence between two qualities or characteristics. 

Corner Test of Association. There is another test of inde- 
pendence which is intuitively appealing and easy to perform, 
having as its chief drawback the necessity of drawing a scatter 
diagram.” It has the practical advantage that no tables are 
necessary for evaluating significance. The test essentially ignores 
the mass of data near the center of the scatter diagram and is 
based on those observations at the periphery. If association of 
one variable with another in the extreme cases is of central interest 
then there is probably no more suitable test known. 

The application of the procedure is made clear by considering 
an example. Suppose that 18 subjects have the following scores 
on a scale of ethnocentrism (X) and of verbal intelligence (Y): 


Y 32 36 32 37 41 18 43 27 49 47 28 32 46 32 22 45 40 29 
X —11219 1.8 —11412 —425 —3 11 2 6 2 9 —4.6 


Ethnocentrism Score. The points are plotted in a scatter- 
diagram and then a horizontal line is drawn through the Y median 
(Y so = 34) and a vertical line through the X median (Х в = .4). 
Then the line Гл is obtained by placing a ruler horizontally at the 
top of the diagram and moving it down until a point is reached 
which lies on the opposite side of the X median from the topmost 
point. The number of points above the line Li is called т. 1f 
these points (there is one in the diagram) lie in the first quadrant, 
ту is recorded with a plus sign; if they lie in the second quadrant, 


448 - Non-Parametric Methods 


with a minus sign. The line Гг is then located by moving a ver- 
tically placed ruler in from the right hand edge until a point is 
reached on the opposite side of the Y median from the point farth- 
est to the right. The number of points to the right of Г is called 
т. In the example т» = 1, and a plus sign is affixed because the 
point is in the first quadrant. The lines Гз and L, are similarly 
constructed, and r; and r, are found by counting. Points in the 
first and third quadrants are counted positively; points in the 


ЕЕ eaten аи Ee cg 


EST о .4 Д-р ГЕЛДЕР: ЖУТ ЕЕЕ | 
Ethnocentrism Score 


Fie. 18-4 


second and fourth quadrants are counted negatively. The alge- 
braic sum r=r,+m+73+7 is the test statistic. Provided 
N 2 10 the null hypothesis of independence is rejected at the 5% 
level if r exceeds 11 in absolute value, and at the 1% level if r 
exceeds 14 in absolute value. In the present case the null hypothe- 
sis 15 not rejected since r = -1+1+3+1=4 only. 

} If the number of cases is not an even number then each median 
line will pass through a point. If it is the same point it may be 
omitted. Otherwise the two points, one lying on the X median 
line and the other on the Y median line, are removed and one new 
point is added; it has the X coordinate of the point which lay on 
the Y median and the Y coordinate of the other. Then the test 
is carried out as before. 

The test is designed for use with continuous populations so ties 
should not arise. If a small number of ties do appear, however, 
they will require special treatment. If two points (or four, or six, 


я 


References · 449 


etc.) lie on, say, the X median line, then the first is displaced 
slightly to right or left (choose the direction at random) and the 
other is slightly displaced in the opposite direction. Similar in- 
struetions apply to pairs of points lying on the Y median line. 
If in constructing one of the L lines one encounters a set of tied 
observations, he draws the line through them all but takes for 
the contribution to rı a number (which may be a fraction): 


Ns 
Mm +1 


where n, is the number of the observations in the tied set which 
are on the same side of the median as the initial point, and n, the 
number of them on the opposite side. 

It is clear that the test depends chiefly on the extreme observa- 
tions and depends upon the degree to which the data are concen- 
trated in diagonally opposite corners. A study of the diagram 
shows that a point (such as the one in the lower left-hand corner) 
which is an extreme deviate on both the X and Y scales is counted 
twice. This test does not depend on scaling of the values for X 
and Y; only the relative order positions of the X’s and Y’s are 
utilized. At the same time the procedure does not yield an index 
which measures degree of association in the way that the correla- 
tion coefficient does. 


REFERENCES 


1. Birnbaum, Z. W., “Numerical Tabulation of the Distribution of Kolmogorov’s 
Statistic for Finite Sample Size," Journal of the American Statistical Association 
47 (1952), 425-441. 
2. Dixon, W. J. and Mood, А. M., “The Statistical Sign Test,” Journal of the 
American Statistical Association 41 (1946), 557-566. 
3. Dixon, W. J. and Massey, Е. J., An Introduction to Statistical Analysis, New 
York, 1951, McGraw-Hill Book Co., Chapter 17. 

. Friedman, Milton, “The Use of Ranks to Avoid the Assumption of Normality,” 
Journal of the American Statistical Association 32 (1937), 675-701. 

. Friedman, Milton, “А Comparison of Alternative Tests of Significance for the 
Problem of m Rankings,” Annals of Mathematical Statistics 11 (1940), 86-92. 

. Kendall, M. G. and Smith, B. B., “The Problem of m Rankings,” Annals of 
Mathematical Statistics 10 (1939), 275-287. 

. Kendall, М. G., The Advanced Theory of Statistics, I, London, 1945, 388-421. 

. Kendall, M. G., Rank Correlation Methods, London, 1948, Charles Griffin and 
Co., Ltd. 

. Kruskal, Wm. Н. and Wallis, W. A., “Use of Ranks in One-Criterion Variance 
Analysis,” Journal of the American Statistical Association 47 (1952), 583-621. 

10. Mann, Н. B. and Whitney, D. R., “On a Test of Whether One of Two Random 

Variables is Stochastically Larger than the Other,” Annals of Mathematical 


Statistics 18 (1947), 50-60. 


оч о m & 


e 


450 - Non-Parametric Methods 


11. 
12. 


13. 
14. 
15. 


16. 
17. 
18. 


19. 


20. 
21. 
22. 
23. 
24. 
25. 
26. 


Massey, Е. J., Jr., "The Kolmogorov-Smirnov Test for Goodness of Fit," 
Journal of the American Statistical Association 46 (1951), 68-78. 

Massey, F. J., Jr., “The Distribution of the Maximum Deviation between Two 
Sample Cumulative Step Functions,” Annals of Mathematical Statistics 22 
(1951), 125-128. 

Massey, F. J., Jr., *A Note on a Two-sample Test," Annals of Mathematical 
Statistics 22 (1951), 304-300. 

Mood, A. M., Introduction to the Theory of Statistics, New York, 1950, McGraw- 
Hill Book Co., Chapter 16. 

Moore, С. Н. and Wallis, W. A., “Time Series Significance Tests Based on 
Signs of Differences," Journal of the American Statistical Association 38 (1943), 
153-164. 

Moses, Lincoln, *Non-Parametrie Statistics for Psychological Research,” 
Psychological Bulletin 49 (1952), 122-143. 

Olmstead, P. S. and Tukey, J. W., “А Corner Test for Association," Annals of 
Mathematical Statistics 18 (1947), 495-513. 

Pitman, E. J. G., Significance Tests which may be Applied to Samples from 
any Population," Supplement to the Journal of the Royal Statistical 4 (1937), 
117-130, 225-232. 

Pitman, E. J. G., “Significance Tests which may be Applied to Samples from 
any Population, III. The Analysis of Variance Test,” Biometrika 29 (1938), 
322-335. 

Scheffé, Henry, "Statistical Inference in the Non-Parametric Case,” Annals 
of Mathematical Statistics 14 (1943), 305-332. 

Smirnov, N., “Table for Estimating the Goodness of Fit of Empirical Dis- 
tributions,” Annals of Mathematical Statistics 19 (1948), 279-281. 

Wald, A. and Wolfowitz, J., “Statistical Tests Based on Permutations of the 
Observations,” Annals of Mathematical Statistics 15 (1944), 358-372. 

Wallis, W. A., “Rough and Ready Statistical Tests,” Industrial Quality Control 
8 (1952), 35-40. 

White, Colin, “The Use of Ranks in a Test of Significance for Comparing Two 
Treatments,” Biometrics 8 (1952), 33-41. 

Wilcoxon, Frank, “Individual Comparisons by Ranking Methods,” Biometrics 
Bulletin 1 (1945), 80-82. 

Wilcoxon, Frank, Some Rapid Approximate Statistical Procedures, Stamford, 
Conn., 1949, American Cyanimid Co. 


Appendix 


Tables and Charts in the Appendix 


Areas Under the Unit Normal Curve 

Percentile Values of the Unit Normal Curve 

Ordinates and Areas of the Normal Curve 

Cumulative Binomial Probabilities. Sum of the First (т- 1) 
Terms in the Expansion of (Q + P)N 


А. Q = .25 and P = .75 

В. Q=P=.50 

С. Q = .75 and P = .25 
Numerical Coefficients in the Expansion of (Q + P)N 
Interval Estimate of Population Proportion P with Confi- 
dence Coefficient .95 
95th and 99th Percentile Values of Fmax = $max/S’min in a Set 
of k Mean Squares, each Based on n Degrees of Freedom 
Percentile Values of the Chi-square Distribution 
Percentile Values of “Student's” Distribution 
95th and 99th Percentile Values of the F Distribution 
Percentile Values of r for n Degrees of Freedom when p = 0 
der 
D 
Values of the Product-Moment Coefficient of Correlation 
in à Normal Bivariate Population Corresponding to Given 
Proportions of Success 
Interval Estimate for p with Confidence Coefficient .95 
Chart for Estimating the Correlation Coefficient 
Percentile Values of the Rank Order Correlation Coefficient 
in Samples of N Cases from an Uncorrelated Population 
Percentile Estimates of the Mean 
Percentile Estimates of the Standard Deviation 
Transformation of a Proportion (p) to Radians (¢) 
Transformation of Ranks to Standard Scores 
Squares, Square Roots, and Reciprocals 
Logarithms 
Random Numbers 
Preregistration Scores of 447 College Students on the Co- 
operative Service English Test 


Values for Transforming r into z — 1 loge 


453 


462 


470 


472 
477 
478 
479 
480 
481 
484 


486 


TABLE | Areas Under the Unit Normal Curve 

Za Zia a 1-а 2а 22 Bina а 1-а 2а 

0 0 .500 .500 1.000 | —21 21 .018 .982 .036 
= 1 4460 .540 .920 | —22 2.2  .014 .986 .028 
— 2 2 421 .579 841 | —23 2.3 O11 .989 .021 
—.8 З .382 68 .764/ —24 24 .008 .992 .016 
—4 4  .345 .655 .689 | —2.5 2.5 .006 :994 012 
= .5 5 309 691 .617 | —26 26 .005 .995 .009 
— 6 6  .274 .726 549 | —2.7 2.7  .008 997 .007 
=T 177 1.942 758 484 | —2.8 2.8 .003 ‚997 .005 
— 8 8 22  .788 424 | —2.9 2.9 .002 .998 004 
-9 9 .184 86 .368 | —3.0 3.0  .00135 .99865 .00270 
— 07; 1.0 159. 841 ЗИ | — 3.1 3.1 .00097 .99903 .00194 
ОШ 11136 804 .271 | — 3.2 3.2 .00069  .99931  .00137 
=12 115 7.885 .230 | —3.3 3.3  .00048 .999529 .00097 
9 131097 5903 .194 | — 34 3.4  .00034 .99966 .00067 
= 14. 14 081 .919 162 | —3.5 3.5  .00023 0.99977 .00047 
215 15. 067 933 134 | — 3.6 3.6 .00016 .99984 .00032 
— 16 16 05656 -945 110 | — 3.7 37 00011 .99989  .00022 
— 1.7 17 045 .955 089 | —3.8 3.8  .00007  .99993  .00014 
=1.8 18 1036 1964 .072 | —3.9 3.9 .00005 .99995 .00010 
cuu aum 5020 ROZI :057 | —4.0 4.0 .00003 .99997 .00006 
—2.0 2.0 .023 0.977 .046 


454 


TABLE Il Percentile Values of the Unit Normal Curve * 
eee 
Area Area Area Area. Area 
to the to the to the to the to the 
left of left of left of left of left of 

2 2 2 z z 
0001 — 3.719 | .045 — 1.695 | .280 .583 | .700  .524 | .950 1.645 
.0002 — 3.540 | .050 — 1.645 | .300 .524 | .720 .583 | .955 1.695 
.0003 — 3.432 | .055 — 1.598 | .320 .468 | .740 .643 | .960 1.751 
0004 — 3.353 | .060 — 1.555 | .340 .412 | .750 .6745 | .965 1.812 
0005 — 3.291 | .065 — 1.514 | .360 .358 | 760 .706 | .970 1.881 
001 — 3.090 | .070 — 1.476 | .380 .305 | .780 .772 | .975 1.960 
002 — 2.878 | .075 — 1.440 | .400 .253 | .800 .842 | .980 2.054 
003 — 2.748 | .080 — 1.405 | .420 .202 | .820 .915 | .985 2.170 
004 — 2.652 | 085 — 1.372 | .440 151 | .840 .994 | .990 2.326 
005 — 2.576 | 090 — 1.341 | .460 100 | .860 1.080 | .991 2.366 
006 — 2.512 | 095 — 1.311 | .480 :050 | .880 1.175 | .992 2.409 
007 — 2.457 | 100 — 1.282 | .500 000} .900 1.282 | .998 2.457 
008 — 2.409 | .120 — 1.175 | .520 .050 | .905 1.311 | .994 2.512 
009 — 2.366 | 140 — 1.080 | .540 100 | .910 1.341 | 995 2.576 
.010 — 2.326 .160 — .994 | .560 151] .915 1.372 | .996 2.652 
015 —2.170| 180 — .915 | .580 .202 | .920 1.405 | .997 2.748 
020 — 2.054 | 200 — .842 | .600 .253 | .925 1.440 | .998 2.878 
025 — 1.960 | .220 — .772 | .620 .305 | .930 1.476 | .999 3.090 
030 | — 1.881| 240 — .706 | .640 .358 | .935 1.514 | .9995 3.291 
035  — 1.812| .250 — .6745) .660 .412 | 940 1.555 | 9996 3.353 
040 —1.751| 2260 — .643 | .680 468 | .945 1.598 | .9999 3.719 


Eo rere ee ee 
* Entriesin this table are taken from The Kelley Statistical Tables, Harvard University 


Press, 1938, revised 1948, by permission of the author, Truman L 


ee Kelley. 


zis the percentile value named by the area to the left of z. Thus 2.40 = —.524 is 
the 30th percentile. Discussion of the reading of this table is found on page 34. 


455 


TABLE Ш 


Ordinotes and Areas of the Normal Curve * 
(In terms of c units) 


00 0000 
.01 ‚0040 
02 ‚0080 
03 +0120 
04 .0160 
+05 ‚0199 
-06 +0239 
-07 +0279 
-08 +0319 
-09 „0359 
-10 +0398 
n 40438 
.12 ‚0478 
-13 +0517 
14 +0557 
415 +0506 
416 +0636 
T +0675 
18 «0714 
-19 «0753 
.20 ‚0793 
21 „0832 
+22 +0871 
+23 +0910 
.24 .0948 
-25 +0987 
.26 +1026 
-27 +1064 
.28 +1103 
429 1141 
m 1179 
„31 +1217 
„32 1255 

33 1293 
+34 +1331 
+35 +1368 
„36 „1406. 
37 +1443 
-38 +1480 

39 +1517 
440 +1554 
а .1591 
„42 .1628 
443 +1664 
+44 +1700 
«45 +1736 

46 „1772 

47 ‚1808 

48 ‚1844 

49 +1879 

50 +1915 


* This table is reproduced from J. Е. Wer 


McGraw-Hill Book Co, 


+3989 
+3989 
3989 
+3988 
‚3986. 


+3984 
+3982 
„3980 
-38977 
+3973 


+3970 
3965 
3961 
.3956 
.3951 


.3945 
3939 
.3932 
+8925 
.3918 


.3910 
+3902 
+3894 
„3885 
„3876 


+3867 
„3857 
+3847 
+3836, 
+3825 


+3814 
+3802 
+3790 
3778 
43765 


3752 
3739 
+3725 
.3712 
438697 


3683 
„3668 
43053 
43637 
+3621 


43605 
+3589 
+8572 
.3555 
„3538 


4.8521 


456 


1.00 | .3413 .2420 
1.01 3438 .2396 
1.02 3461 .2371 
1.03 | .3485 2347 
1.04 | .3508 «2323 
1.05 „3581 +2209 
1.06 .3554 .2275 
1.07 3577 .2251 
1.08 3599 .2227 
1.09 | .3621 +2203 
1.10 | .3643 +2179 
1.11 3665 .2155 
1.12 3686 .2131 
1.13 3708 .2107 
1.14 „3729 2083 
1.15 | .3749 +2059 
1.16 .3770 .2036 
1.17 3790 .2012 
1.18 .3810 «1989 
1.19 | .3830 +1965 
1.20 | .3849 .1942 
1.21 | „3869 .1919 
1.22 +3888 «1895 
1.23 | .3907 .1872 
1.24] .3925 .1849 
1.25 | .3944 .1826 
1.26 | .3902 .1804 
1.27 | .3980 1781 
1.28 | .3997 +1758 
1.29 | .4015 „1736 
1.30 | ..4032 .1714 
1.31 | .4049 .1691 
1.32] .4066 +1669 
1.33 | .4082 .1647 
1.34 | ,4099 «1626 
1.35 | .4115 ‚1604 
1.36 | .4131 +1582 
1.37 | .4147 1561 
1.38 | .4102 +1539 
1.39 | .4177 1518 
1.40 | .4192 .1497 
1.41| .4207 +1476 
1.42 | .4222 +1456 
1.43 | .4236 +1435 
1.44 | .4251 +1415 
1.45 | .4265 1304 
1.46 | .4279 .1374 
1.47 | .4292 .1354 
1.48 | .4306 „1334 
1.49 | .4319 „1315 
1.50 | .4332 „1295 


t, Educational Statistics, by courtesy of 


Ordinates and Areas of the Normal Curve. — (Concluded) 
(In terms of c units) 


Е Area |Огашые | = Area | Ordinate 
1.50 | .4332 | .1205 | 2.00] .4772 | .0540 
1.51 | .4345 | .1276 | 2.01] .4778 | .0529 
1.52 „4357 1257 2.02 +4783 „0519 
1.53 .4370 1238 2.03 4788 „0508 
1.54 | .4382 | .1219 | 2.04] .4793 | .0498 
1.55 4394 .1200 2.05 .4798 .0488 
1.56 | .4406 | .1182 | 2.06 | .4803 | .0478 
1.57 | .448 | .1163 | 2.07 | .4808 | .0468 
1.58 | .4420 | .1145 | 2.08 | .4312 | .0459 
1.59 | .4441 | .1127 | 2.09 | .4817 | .0449 
1.60 | .4452 | .1109 | 2.10 | .4821 | .0440 
1.61 | .4463 | .1002 | 2.11 | .4826 | .0431 
3.02 | .4474 | .1074 | 2.12 | .4830 | .0422 
1.63 | .4494 | .1057 | 2.13 | .4834 | .0413 
1.04 | .4495 | .1040 | 2.14 | .4838 | .0404 
1.65 | .4505 | .1023 | 2.15 | .4842 | .0395 
1.60 | .4515 | .1000 | 2.16 | .4846 | .0387 
1.67 | .4525 | .ooso | 2.17 | .4850 | .0379 
1.68 | .4535 | .0973 | 2.18 | .4854 | .0371 
1.69 | .4545 | .0057 | 2.19 | .4857 | .0363 
1.70 | .4554 | .0940 | 2.20] .4861 | .0355 
1.71 | .4564 | .0025 | 2.21 | .4804 | .0347 
1.72 | .4573 | .0909 | 2.22 | .4808 | .0339 
1.73 | .4582 | .0893 | 2.23 | .4871 | .0332 
1.74 | .4501 | .0878 | 2.24 | .4875 | .0325 
1.75 | .4599 | .0863 | 2.25] .4878 | .0317 
1.76 | 4608 | .0848 | 2.26 | .4881 | .0310 
1.77 | 14616 | .0833 | 2.27 | .4884 | .0303 
1.78 | 4625 | .osis | 2.28 | .4887 | .0207 
1:79 | 4633 | .0804 | 2.29 | .4890 | .0290 
1.30 | .4641 | .0700 | 2.20] .4893 | .0283 
1.81 | .4649 | .0775 | 2.31] .4806 | .0277 
1.82 | 14656 | .0761 | 2.32 | .4898 | .0270 
1.83 | sees | .0748 | 2.33 | .4901 | .0204 
1.84 | .4671 | .0734 | 2.34] .4904 | .0258 
1.85 | .4678 | .o721 |.2.35 | .4006 | .0252 
1.86 | .4686 | .0707 | 2.36 | .4909 | .0246 
1.87 | 4693 | .0694 | 2.37] .4911 | .0241 
1.88 | .4c09 | .0681 | 2.38 | .4913 | .0235 
1.89 | „атов | .0669 | 2.39 | .4916 | .0229 
1.90 | .4713 | .0656 | 2.40 | .4918 | .0224 
1.91 | .4719 | .0644 | 2.41 | .4920 | .0219 
1.92 | 4726 | .0632 | 2.42 | .4922 | .0213 
1.98 | .4732 | .0620 | 2.43 | .4925 | .0208 
1.94 | .4738 | „овоз | 2.44 | .4927 | .0203 
1.95 | .4744 | .0506 | 2.45 | .4929 | .0198 
1.96 | .4750 | .0584 | 2.46 | .4931 | .0194 
1.97 | 1476 | .0573 | 2.47 | .4932 | .0189 
1.98 | 4761 | .0562 | 2.48 | .4934 | .0184 
1.99 | (4767 | .0551 | 2.49 | .4936 | .0180 
2.00 | .4772 | .0540 | 2.50 | .4938 | .0175 


457 


| 
| 


Sees 


222 gagga eases 


Bag зачая нана 283288 8828 22328 


о WO RO RO RO PO Ho PO HO HO HO ююююю ююююю NO NO HO HO RO W ww о do No No NO ID we юююкю 
> vd де w 


с ggg eeees воза egger шочын PARAS эзаее о 


OIRA ROHS 


TABLE IV | Cumulative Binomial Probabilities 


Sum of First (т + 1) Terms in Expansion of (О + Р)“. 
(Decimal points omitted to save space ) 


ie 
Q = .25 and P = 75 


ОИЕ абв, eR 3 а 

5 |001 016 104 367 763 * 

6 005 038 169 466 822 * 

7 001 013 071 244 555 867 * 

8 004 027 114 321 633 900 * 

9 001 010 049 166 399 700 925 * 

10 004 020 078 224 474 756 944 * 

п 001 008 034 115 287 545 803 958 * 

12 003 014 054 158 351 609 842 968 * 

13 001 006 024 080 206 416 667 873 976 * 

14 002 010 038 112 258 479 719 899 982 * 

15 001 004 017 057 148 314 539 764 920 987 * 
16 002 007 027 080 190 370 595 803 937 990 
17 001 003 012 040 107 235 426 647 836 950 
18 001 005 019 057 139 283 481 694 865 
19 002 009 029 077 175 332 535 737 
20 001 004 014 041 102 214 383 585 
21 002 006 021 056 130 256 433 
22 001 003 010 030 075 162 301 
23 001 005 015 041 096 196 
24 001 002 007 021 055 121 
23 001 003 O11 030 071 
В QzPz.5 
DE LUIS IS Wm s x 
5 |031 188 500 812 969 * 

6 |016 109 344 656 891 984 * 

7 |008 062 227 500 773 938 992 * 

8 |004 035 145 363 637 855 965 996 * 

9 |002 020 090 254 500 746 910 980 9908 * 

10 |001 Oll 055 172 377 623 828 945 989 999 + 

11 006 033 113 274 500 726 887 967 994 * * 

12 003 019 073 194 387 613 806 927 981 997 * * 

13 002 011 046 133 291 500 709 867 954 989 998 * + 

14 001 006 029 090 212 395 605 788 910 971 994 999 * * 

15 004 018 059 151 304 500 696 849 941 982 996 * * * 
16 002 011 038 105 227 402 598 773 895 962 989 998 * * 
inj 001 006 025 072 166 315 500 685 834 928 975 994 999 * 
18 001 004 015 048 119 240 407 593 760 881 952 985 996 999 
19 002 010 032 084 180 324 500 676 820 916 968 990 998 
20 001 006 021 058 132 252 412 588 748 868 942 979 994 
21 001 004 013 039 095 192 332 500 668 808 905 961 987 
22 002 008 026 067 143 262 416 584 738 857 933 974 
23 001 005 017 047 105 202 339 500 661 798 805 953 
24 001 003 011 032 076 154 271 419 581 729 846 924 
25 002 007 022 054 115 212 345 500 655 788 885 


f Discussion of the reading of this table is found on page 57. 
* 1.0 or approximately 1.0. 


458 


459 


TABLE IV (Continued) 

Q = 75 and P = .25 

0 1 2 3 4 5 6 7 8 9 О eda 13 
237 633 896 984 999 * 

178 534 831 962 995  * d 

133 445 756 929 987 999 * ы 

100 367 679 886 973 996 * + si 

075 300 601 834 951 990 999 * + i 

056 244 526 776 922 980 996 * Ж * s 

042 197 455 713 885 966 992 999  * * * x 

032 158 391 649 842 946 986 997 * * * ч Ы 

024 127 333 584 794 920 976 994 999 * x ж K bi 
018 101 281 521 742 888 962 990 998  * i + * f 
013 030 236 461 686 852 943 983 996 999  * T x ig 
010 063 197 405 630 810 920 973 993 998 * * 2 ч 
008 050 164 353 574 765 893 960 988 997 999 * * Ы 
006 039 135 306 519 717 861 943 981 995 999 * Н x 
004 031 111 263 465 668 825 923 971 991 998 * x * 
003 024 091 225 415 617 786 898 959 986 996 999 * * 
002 019 075 192 367 567 744 870 944 979 994 908  * Ы 
002 015 061 162 323 517 699 838 925 970 990 997 999 * 
001 012 049 137 283 468 654 804 904 959 985 995 999 * 
001 009 040 115 247 422 607 766 879 945 979 993 998 999 
001 007 032 096 214 378 561 727 851 929 970 989 997 999 


TABLE V* Numerical Coefficients in the Expansion of (Q + P)N 


AIG, WS 2 Зиа Б в `7 8 9 10 Б Ж 
ofi 2 = 
Т 1 = 

D DEL = 

Bt Ba m 2 = 

Zu. 6 а 1 2 = 
ВИЕ Тю во 2 = 

о ООЗЕ ЕСЕК 2 = 

П а Piz, кү! 2 = 
Sis dS. BS M YQ Ба 6 5 8 1 2 = 

9/1 9 36 84 126 126 84 36 9 1 2 ш 
10/1 10 45 120 210 252 210 190 45 10 1/20 = 
ити 55 165 330 462 402 30 105 55s f= 20 
12|1 12 6 220 495 702 924 792 495 220 6629—- 

19 1 13 78 286 715 1287 1716 1716 1987 715 280|28- 

14 |1 14 91 364 1001 2002 3003 3432 3003 2002 1001|2и= 168 
181 15 105 455 1305 3003 5005 6435 6435 5005  3003|2»— 327 
16]1 16 120 560 1820 4368 8008 11440 12870 11440  8008|29— 655 
17|1 17 136 680 2380 6188 12376 19448 24310 24310 19448 | 9” = 

18 1 18 153 816 3060 8568 18564 31824 43758 48620 43758 [28 = 26214 
1911 19 171 969 3876 11628 27132 50388 75582 92378 92378 29 = 524 
20] 1 20 190 1140 4845 15504 38700 77520 125970 167960 184756 | 2" = 10485 


Note: After 2" part of the distribution is omitted, but the sum can be obtained thus: 


28 = 2(1 + 13 + 78 + 286 + 715 + 1287 + 1716) 
24 = 2(1 + 14 + 91 + 364 + 1001 + 2002 + 3003) + 3432 


* Discussion of the reading of this table is found on page 59. 


450 


CHART VI Interval Estimate of Population Proportion P with Confidence 
Coefficient .95 for М = 10, 15, 20, 30, 50, 100, 250, and 1000.* 


1.0 


р S 


50, 


7 2 


Jo 
3 
2% 


Population Value of Р 
a 
EX 


AY 


С 1 
iy == 
Д 2 i B 


0 1 2 3 A 5 6 7 8 .9 1.0 
Sample Value of p 


* Reproduced by permission of the authors and the editor from Clopper, С. J. 
and Pearson, E. S., “The Use of Confidence or Fiducial Limits Illustrated in the Case 
of the Binomial,” Biometrika, 26 (1934), 404—413. 


461 


TABLE УП 95th and 99th Percentile Values of Fmax = s?mox/S?min in a 


2 3 4 5 6 
39.0 87.5 142. 202. 266. 
f 199. 448. 729. 1036. 1362. 
3 15.4 27.8 39.2 50.7 62.0 
47.5 85. 120. 151. 184. 
4 9.60 15.5 20.6 25.2 29.5 
23.2 37. 49. 59. 69. 
5 7.15 10.8 137 16.3 18.7 
14.9 22. 28. 33. 38. 
6 5.82 8.38 10.4 12.1 13.7 
11.1 15.5 19.1 22, 25. 
7 4.99 6.94 8.44 9.70 10.8 
8.89 12.1 14.5 16.5 18.4 
8 4.43 6.00 7.18 8.12 9.03 
7.50 9.9 11.7 13.2 14.5 
9 4.03 5.34 6.31 7.11 7.80 
6.54 8.5 9.9 11.1 12.1 
10 3.72 4.85 5.67 6.34 6.92 
5.85 74 8.6 9.6 10.4 
12 3.28 4.16 479 5.30 5.72 
4.91 6.1 6.9 7.6 8.2 
15 2.86 3.54 4.01 4.37 4.68 
4.07 4.9 5.5 6.0 6.4 
20 2.46 2.95 3.29 3.54 3.76 
3.32 3.8 4.3 4.6 4.9 
30 2.07 2.40 2.61 2.78 2.91 
2.63 3.0 3.3 3.4 3.6 
60 1.67 1.85 1.96 2.04 2.11 
1.96 2.2 2.3 2.4 2.4 
© 1.00 1.00 1.00 1.00 1.00 
1.00 1.00 1.00 1.00 1.00 


ILE ND oin 


T Reproduced by permission of the author H. A. David and the Editor of Biometrika. 
Values in the column Ё = 2 and in the rows n = 2 and оо are exact. Elsewhere the 


462 


Set of К Mean Squares each Based оп n Degrees of Freedom ї 


if 8 9 10 al 12 n 

338. 408. 475. 550. 626. 704. 2 

1705. 2063. 2432. 2813. 3204. 3605. 
72.9 83.5 93.9 104. 114. 124. 

216.* 249.* 281.* 310.* 337.* 361.* 3 
33.6 37.5 41.1 44.6 48.0 51.4 4 
79. 89. 97. 106. 113. 120. 

20.8 22.9 24.7 26.5 28.2 29.9 5 
42. 46. 50. 54. 57. 60. 
15.0 16.3 17.5 18.6 19.7 20.7 6 
27. 30. 32. 34. 36. 37. 
11.8 127 13.5 14.3 151 15.8 7 
20. 22. 23. 24. 26. 27. 
9.78 10.5 11.1 11.7 12.2 12.7 8 
15.8 16.9 17.9 18.9 19.8 21. 
8.41 8.95 9.45 9.91 10.3 10.7 9 
13.1 13.9 14.7 15.8 16.0 16.6 
7.42 7.87 8.28 8.66 9.01 9.34 10 
11.1 11.8 12.4 12.9 13.4 13.9 
6.09 6.42 6.72 7.00 7.25 7.48 12 
8.7 9.1 9.5 9.9 10.2 10.6 
4.95 5.19 5.40 5.59 5.77 5.98 15 
6.7 7.1 7.3 7.5 7.8 8.0 
3.94 4.10 4.24 4.37 4.49 4.59 20 
5.1 5.3 5.5 5.6 5.8 5.9 
3.02 3.12 3.21 3.29 3.36 3.39 30 
3.7 3.8 3.9 4.0 4.1 4.2 
2.17 2.22 2.26 2.30 2.33 2.36 60 
2.5 2.5 2.6 2.6 27 2.7 
1.00 1.00 1.00 1.00 1.00 1.00 & 
1.00 1.00 1.00 1.00 1.00 1.00 


Ee EE EEE eee 
third digit may be in error by several units. Upper value in each cell is the 95th per- 
centile, lower value is the 99th. * indicates that third digit is uncertain. 


463 


Percentile Values of the Chi-square Distribution * 


TABLE ҮШ 


Liao оосо ооо 
а TACO cro] riam SELIR NANNA ANAS $38 
9090 сома 9 vot озо. CRS ret Rea re ее 
9 ооо сїї) aor SONGS good 00996 VID 
125554 NAARS 5952855 89999 SFFR 55858 BOT 
# | союоы ооо осу 0009000900 cor- CI о «ноосоо coo oco 
ej [roc 56455 SHGAN ивы пф HSN ON 
сане дааа SAAS 55589 59333 SSERS $99 
£&|« Go ч юным RANTS OHONO AMDOM ONDO POS 
jou 506 сб Moro cour oco ANOS om 
< 2258 вяза ЕЕЕ 95585 ваза QUSS ©0069 
B oor. н оосу OI юяюо COMMS оосо ono vod 
"redo icon AGHSH SANG ONES чаво олы 
ROMO SAGAN ааах 95988 55599 40757 S 
Боч что HOMO осот. о MAHAN воно омчюыо YE 
мане fotos йбх меня оссо со іб Soo 
*X|^"9Z48 ASRSR дадан 99595 55995 BRIO BON 


ы AHA HANNA 08 00 0900 09. 690000 0000000000 0208.02 09.08. 020308 
сіс чі воно Oo HON Scio 5051-06 DOS 
х М бета SOARS шеказ Maud BARNS RES 
| оона 
LITRAN вачеы AANA ROMO RAIS erro ron 
MAN 89-8125 NASON noc Grac усе xi cci 
аз FASS 55х26 RANKS BRS 
= 
22988 aede сосок очен суо сотто como mind 
dH Nass ры SOSAN або exo cod 
E кейш HESS ESSO SO 
з eua 
«|| HER Sane заем Seco осоо «aora ao 
м = 101040000 HABAN Od OG OG Mole чн itcr OOK 
== ЕЗД вязка AS 
ë| 12438 
a| 18848 anana ючо2 aenae corto тосоо wna 
ES MANM 02-1255 01-00000 SAHA оонабас іс й 
чам а a i ча OM. 
LONN NONNO чоо mo 
4 12398 #8888 275598 58554 сомон «cec ond 
E mie oo $ SENOS GOHAN stains do 
анна err 
= | S092 recom uo 
5 | 18283 858828 35-25 53888 „моа часа ago 
ES MAAN 00060 Фа венных 6o GO ісі 15 ANS 
чан SASS GO 
s| z543 22 
S| TESEI Sana oodo чызах самою cona ross 
Se TAN 010000949 ASON 000009 MANNA Sisk 
Няяя ass 
no uw ONOGS мяючю ONDDOA m + rom 2 
8 апаа ADAIR NANAS SARAR 982 


permission of the author, Catherine М. Thompson, and the 
rinted abridged from В. A. Fisher and Е. Yates, Statistical Tables, published by 


ith the 


wi 


ika, Vol. 32 (1941), and published 


Хз, Х.в, and X*.59 are гер 


n of the authors and 


* Abridged from table in Biometri 
Columns 
permission 


Oliver and Boyd Ltd. by 


editor of Biometrika. 


"saoqsqqnd раз 
s1o0qjnu oq; jo повзнилеа Aq "pyT pog pue зәлцо £q peusqand ‘әдет, [921351835 ‘SIWA `4 pue зәцеія "y "jp шолу рэрыае poyuudey + 


$0 — swg — р e — =) — a) — 

E 9/ez 0961 ero as E E 
Oct coc S6I 9971 es 89° Oct 
09 997% 00€ 191 es RY 09 
OF 0L coc 891 es 89° OF 
og erc Toc or es E og 
6z 92% 10% 021 eg 89° 65 
8с 975 60% 0271 es 89 85 
22 LEZ ША 021 98° E 25 
9c 8276 90% YI 93° E 9% 
сс 6/6 90:2 ILI 98 89° $5 
TG 08: 907% ит 98: 89 yc 
£c 183 10% ICI 98: 69" $5 
55 585 10% 211 98: 69° © 
it $85 80% BLT os 69° 16 
05 esc 60°% BLT 98: 69 05 
61 98:2 60°% Eri 98 69° 6T 
81 88`б ore ExT 98: 69° т ха 
pu 06% IV FLT 98: 69° т o 
9r 36'S аа SLT Ig 69° 9r 
вт coz erz егт 18` 69° er 
#1 86% FUE 9rT 28: 69: Ут 
ЄТ TO'S 91% LET 28° 69° £I 
[41 org Sad 821 28: 02° га; 
IL Ire 055 OST $8 or тї 
от Lre $55 IS'I E oz от 
6 cee 92:2 EST 88° or 6 
8 oce TEZ 9&1 68: Iz 8 
L oce А 681 06: i 1 
9 TL'E SFZ *61l 16 eL 9 
¢ eor 1€6 20 56° gr $ 
Y 09°F 8/5 erz 76 vL Y 
© 78:6 sre cez 86: 9r © 
5 26:6 [32 565 901 58° 5 
1 99-9 ra 1¢9 861 0071 1 
u 3666] 266] ив] s67 [y] с} u 


(СРТ 'd ees 'opqu3 Surpeer uo 50013211511 204) 
ж UOHNGUISIG ,,S4UEPNIS,, JO ѕәпјрд ejuue»ied XI AVL 


0 OO ЕЕ ——X—— noo ooo ool /[O^ÓAMEMEEMEEEEESSECEDEIESEMECEIELLALLLOAAAOAAAAGAOAOUOLULLLULL,LLLLAe 8С 


degrees of freedom for denominator 


Tm 


TABLE X 95th and 99th Percentile 


95TH PERCENTILE IN Ілонт-Елсе ТҮРЕ, 
m = degrees of freedom 


|1 2 3 4 5 6 7 8 9 10 11 12 


1 | 161 200 216 225 230 
4,052 4,999 5,403 5,625 5,764 5,859 
2 |1851 19.00 19.16 19.25 19.30 19.33 
98.49 99.01 99.17 99.25 99.30 99.33 
3 1013 9.55 928 912 9.01 8,94 
34.12 30.81 29.46 28.71 28.24 27.91 
4 | 7.71 6.94 659 6.39 6.26 6.16 
21.20 18.00 16.69 15.98 15.52 15.21 
5 | 661 5.79 541 5.19 5.05 4.95 
16.26 13.27 12.06 11.39 10.97 10.67 
6 | 5.99 514 4.76 4.53 4.39 4.28 
13.74 10.92 9.78 9.15 8.75 847 
7 | 559 474 435 412 397 3.87 
12.25 9.55 845 7.85 146 7.19 
8 | 532 4.46 407 3.84 3.69 3.58 
11.26 8.65 7.59 7.01 663 6.37 
9 | 5.12 426 386 3.63 3.48 3.37 
10.56 8.02 699 6.42 6.06 5.80 
10 | 496 4.10 371 3.48 333 3,22 
1004 7.56 655 5.99 5.64 5.39 
11 | 484 3.98 359 3.36 3.20 3.00 
9.65 7.20 622 5.67 5.32 5.07 
12 | 475 388 349 3.26 311 3.00 
9.33 6.93 5.95 5.41 5.06 4.82 
13 | 467 3.80 341 3.18 302 2.92 
9.07 6.70 5.74 5.20 486 4.62 
14 | 460 374 3,34 EE 296 285 
8.86 6.61 5.56 469 4.46 
15 | 4.54 368 3.29 " ia 290 2.79 
8.68 6.36 542 489 4.56 432 
16 | 449 3.63 3.24 zo 285 274 
8.53 6.23 5,29 444 4.20 
17 | 4.45 3.59 3.20 4 Е 281 2.70 
840 6.11 618 467 434 410 
18 | 441 3.55 316 293 277 2.66 
8.28 6.01 5.09 458 425 401 
19 | 438 3.52 313 290 274 2.63 
818 593 501 450 417 3.94 
20 | 4.35 3.49 3.10 287 271 2.60 
8.10 585 494 443 410 3.87 
21 | 432 3.47 307 284 268 257 
8.00 6.78 487 437 404 3.81 
22 | 430 344 305 2.82 266 2.55 
7.94 572 482 431 399 3.76 
23 | 428 342 3.03 2.80 264 2.53 
7.88 5.66 4.76 426 3.94 3.71 
24 | 426 340 301 2.78 262 2.51 
7.82 5.61 472 4.22 3.90 3.67 


466 


Values of the F Distribution * 
99th Percentile in Bold-Face Type 


for numerator 


degrees of freedom for denominator 


= 
= 
п = 


25 


* From Snedecor, С. W., Statistical Methods, lowa State College Press, Inc., by per- 


mission of author and publisher. 


467 


г 


degrees of freedom Гог denomina: 


Ne = 


TABLE X 95th and 99th Percentile 


95тн PERCENTILE IN Licut-Facn ТҮРЕ, 


nı = degrees of freedom 


3.54 
2.45 
3.51 


2.44 
3.49 
2.43 
3.46 


2.42 
3.44 


2.41 
3.42 


2.40 
3.41 


2.38 
3.37 
2.37 
3.34 


2.36 
3.31 


2.35 
3.29 
2.33 
3.25 
2.30 
3.20 
2.29 
3.17 


2.27 
3.14 
2.26 
3.11 
2.23 
3.06 
2.22 
3.04 


2.21 
3.02 


3.32 
2.34 
3.29 


2.32 
3.26 


2.31 
3.24 


2.30 
3.22 


2.30 
3.20 


2.29 
3.18 
2.27 
3.15 
2.25 
3.12 
2.24 
3.09 
2.23 
3.07 


2.21 
3.04 


2.19 
2.99 


2.17 
2.95 
2.16 
2.92 
2.14 
2.90 
2.12 
2.85 


2.10 
2.82 
2.09 
2.80 


3.15 


2.25 
3.12 


2.24 
3.10 


2.23 
3.07 


2.22 
3.05 


2.21 
3.04 


2.20 
3.02 


2.18 
2.98 


2.17 
2.95 


2.15 
2.93 


2.14 
2.91 


2.12 
2.87 


2.10 
2.82 


2.08 
2.79 


2.07 
2.76 


2.05 
2.73 


2.03 
2.69 


2.02 
2.66 


2.01 
2.64 


3.02 
2.18 
2:99 


2.17 
2.96 


2.16 
2.94 


2.14 
2.92 


2.14 
2.90 


2.13 
2.88 


2.11 
2.85 


2.10 
2.82 


2.08 
2.79 


2.07 
2.77 


2.05 
2.74 


2.03 
2.69 


ү 


a 
2.62 


1.98 
2.60 
1.96 
2.55 
1.95 
2.53 


1.94 
2.51 


2.91 
2.12 
2.88 


2.11 
2.86 


2.10 
2.84 


2.09 
2.82 


2.08 
2.80 


2.07 
2.78 


2.05 
2.75 


2.04 
2.72 


2.02 
2.70 


2.01 
2.67 


1.99 
2.64 


1.97 
2.59 


1.95 
2.56 


1.94 
2.53 


1.92 
2.50 


1.90 
2.45 


1.89 
2.43 
1.88 
2.41 


468 


Values of the F Distribution*® (Continued) 
99th Percentile in Bold-Face Type 


for numerator 


14 16 20 24 30 40 50 75 100 200 500 © т. 


2.08 2.03 1.97 193 188 184 180 176 174 171 168 167 27 
2.83 2.74 2.63 255 2.47 2.38 2.33 225 221 2.16 2.12 2.10 


2.06 2.02 196 1.91 187 181 178 175 172 169 167 165 28 
2.80 2.71 2.60 2.02 2.44 2.35 2.30 222 2.18 2.13 2.09 2.06 


2.05 2.00 1.94 190 185 180 177 173 171 168 165 1.64 29 
2.17 2.68 2.57 249 2.41 2.32 2.27 219 2.15 210 2.06 2.03 

2.04 199 193 189 184 179 1.76 172 169 166 164 162 30 
2.74 2.66 2.66 247 238 229 224 2.16 2.13 2.07 203 2.01 
2.02 1.97 191 186 182 176 174 169 167 164 161 1.59 32 
270 262 2.61 242 234 225 220 212 208 202 198 1.96 
2.00 195 189 184 180 174 171 167 164 161 1.59 157 34 
2.66 2.58 247 238 230 221 2.16 2.08 2.04 198 1.94 1.91 
1.08 1.93 187 182 178 172 169 165 162 159 1.56 1.55 36 
2.62 2.54 243 235 226 217 2.12 204 2.00 1.94 1.90 1.87 


196 1.92 185 180 176 171 167 163 160 157 1.54 153 38 
2.59 2.51 2.40 2.32 222 214 2.08 2.00 1.97 190 1.86 1.84 


н 
195 190 184 179 174 169 166 161 159 155 15 151 40 
256 249 237 229 220 211 205 197 194 188 184 181 а 
194 189 182 178 173 168 164 160 157 154 151 149 42 |'3 
2.54 246 235 2.26 217 208 202 194 191 185 180 1.78 8 
1.92 1.88 181 176 172 166 1.63 158 1.56 1.52 150 148 44 9 
2.52 244 232 2.24 215 2.06 2.00 192 1.88 1.82 1.78 1.75 @ 
191 187 180 175 171 165 162 157 154 151 148 146 46/5 
2.50 2.42 2.30 222 2.13 2.04 198 190 186 180 1.76 1.72 = 
190 186 179 174 170 164 161 156 153 150 147 145 48| B 
248 2.40 2.28 2.20 2.11 202 1.96 1.88 184 1.78 1.73 1.70 О 
190 185 178 174 169 163 160 155 152 148 146 144 50 
246 239 2.26 218 210 2.00 194 186 182 1.76 171 168 
188 183 176 172 167 161 158 152 150 146 143 141 55 |5 
2.48 235 2.28 215 2.06 196 190 182 178 1.71 1.66 1.64 
186 181 175 170 165 159 156 150 148 144 141 139 60 g 
240 232 220 212 203 193 187 1.79 174 168 163 1.60 E] 
1.85 180 173 168 163 157 154 149 146 142 139 137 65 |5 
237 230 218 2.09 2.00 1.90 1.84 1.76 1.71 1.64 1.60 1.56 1 
184 179 172 167 162 156 153 147 145 140 137 135 70 
2.85 228 215 2.07 1.98 188 182 1.74 169 162 156 153 ё 


1.82 177 170 165 160 154 151 145 142 138 135 132 80 
2.32 224 211 203 194 184 1.78 1.70 1.65 1.07 1.02 149 


179 175 168 163 157 151 148 142 139 134 130 1.28) 100 
2.06 219 2.06 1.98 189 1.79 1.73 164 159 1.51 146 143 


177 172 165 160 1.55 149 145 139 136 131 127 1.25 125 
223 215 2.03 1.94 1.85 1.75 168 1.59 1.54 146 140 1.37 


176 171 164 159 154 147 144 137 134 129 125 122) 150 
2.20 2.12 2.00 her 183 172 1.66 156 1.51 143 137 1.33 


174 169 1.62 157 152 145 142 135 132 126 122 119) 200 
2.47 2.09 197 188 1.79 169 162 1.58 148 1.39 138 1.28 


172 167 160 154 149 142 138 132 128 122 116 113 400 
2.12 204 1929 184 174 164 157 147 142 132 124 1.19 


170 165 1. 153 147 141 136 130 126 119 113 1.08 | 1,000 
2.09 2.01 135 1.81 1171 161 1.54 1.44 138 128 119 111 | 
© 


169 164 157 152 146 140 135 128 124 117 111 100 
207 199 187 1.79 169 159 152 141 136 1.20 1.15 1.00 


* From Snedecor, G. W., Statistical Methods, Iowa State College Press, Inc., by per- 
mission of author and publisher. 


469 


TABLE ХІ * Percentile Values of г for n Degrees of Freedom when р = 0 
n | Tos 7.975 T.99 7.995 Т.9995 n T.95 — [Г.975 Тоу 7.995 70905 
1 | .988 0.997 .9995 .9999 1.000 30 | .296 .349 .409 .449 .554 
2 | .900 .950 .980 .990 .999 35 | .275 .325 .381 .418 .519 
3 | .805 .878 .934 .959 991 40 | .257 804 .358 .393 .490 
4 | .729 811 .882 .917 974 45 | .243 288 .338 .372 .465 
5 | .669 .754 .833  .874 .951 50 | .231 .273 322 354 .443 
6 | .622 .707 .780  .834 .925 55 | .220 .261 .307 .338 .424 
7 | .582 .666 0.750 .798 .898 60 | .211 .250 .295 .325 0.408 
8 | .550 .632 716 .765 .872 65 | .203 .240 .284 312 .393 
9 | .521 .602 .085  .735 847 70 | .195 .232 .274 .302 .380 
10 | .497 .576 .658  .708 .823 75 | 189 .224 .264 .292 .368 
11 | .476 .553 .634  .684 :801 80 | .183 .217 .256 .283 .357 
12 | 458 .532 .612 .661 780 85| 178 211 .249 .275 34 
13 | 441 .514 .592 .641 -760 90 | .173 .205 .242 .267 .338 
14 | .426 497 .574  .623 742 95 | .168 .200 .236 .260 329 
15 | 412 482 .558 .606 725 100 | 164 .195 .230 .254 .321 
16 | 400 .468 .542 .590 -708 125 | .147 .174 .206 .228 .288 
17 | .389 .456 .528  .575 .693 150 | .134 .159 .189 .208 .264 
18 | .378 444 .516  .501 -679 175 | .124 .148 .174 .194 .248 
19 | .369 .433 .503  .549 .665 200 | .116 .138 .164 .181 .235 
20 | .360 428 492  .537 .652 300 | .095 .113 .134 .148 .188 
22 | .344 404 472  .515 .629 500 | .074 .088 .104 .115 .148 
24 | 330 .388 .453 .496 607 || 1000 | .052 .062 .073 .081 .104 
25 | 323 .381 .445  .487 -597 || 2000 | .037 .044 .016 .058 .074 
T—.o5 —.02 —T.01 —T.095 —7оо 7.5 —7.05 —T.01 7.005 — 7.0005 


* Reprinted abridged from R. A. Fisher and Е. Yates, Statistical Tables, published 
by Oliver and Boyd Ltd. by permission of the authors and publishers, 


470 


TABLE ХИ* Values for Transforming г into z = Flog. | +r 


ПЕ 


10 | .0000  .or00 | .0200 .0300 |.0400 .0500 | .0599  .0699 | .0798 .0898 
.1 | .0997 .1096 | .1194 .1293 | 139т 1489 | .1587 .1684 | .1781  .1878 
.2 | 1974 .2070 | .2165 .2260 | .2355 .2449 | .2543 +2636 | .2729 .2821 
.3 | -2913 .3004 | .3095 -3185 | .3275  .3364 | -3452 -3540 |.3627 .3714 
4 | .3800 .3885 | .3969 .4053 | .4136 .4219 | .4301 .4382 | .4462  .4542 
5 | .4621  .4700 | .4777 .4854 | .4930  .5005 | .5080 .5154 |.5227 .5299 
+6 | 5370  .5441 | „5511  .5581 | .5649  .5717 | -5784 -5850 | 5915 5980 
.7 | 6044  .6107 | .6169  .6231 |.6291 .6352 | .6411 .6469 | .6527 .6584 
.8 | .6640 .6696 | .6751  .6805 | .6858  .6911 | .6963 .7014 | .7064  .7114 
9 |.7163  .7211 | .7259  .7306 |.7352 .7398 | -7443 -7487 | .7531 .7574 
1.0|.7616 .7658 | .7699  .7739 | 7779 -7818 | .7857 7895 | .7932  .7969 
1.1 | .8005 .8одт | .8076 „8110 |.8144 .8178 |.8210 .8243 | .8275 .8306 
1.2 | .8337 .8367 |.8397 -8426 | .8455  .8483 | 8511 .8538 | .8565 .8501 
1.3 | .8617  .8643 | .8668  .8693 | .8717 -8741 | -8764 .8787 |.8810 .8832 
1.4 | .8854 .8875 | .8896 .8917 | .8937  .8957 |.8977  .8996 | .9015 9033 
1.5 | .9052 .9069 | .9087 .9104 | (9121 .9138 | .9154  .9170 .9186 .9202 
1.6|.9217  .9232 |.9246  .9261 | .9275  .9289 | .9302 .9316 | .9329  .9342 
1.7 |.9354  .9367 |.9379 -9391 |.9402 -9414 | 9425  .9436 | .9447 +9458 
1.8 | .9468  .9478 |.9498  .9488 |.9508  .9518 | .9527  .9536 | .9545 9554 
1.9 | .9562  .9571 |.9579  .9587 |.9595  -9603 |.9611  .9619 |.9626 —.9633 
2.0 | .9640  .9647 |.9654  .9661 |.9668  .9674 


2.1|.9705 .9710 | .9716 .9722 |.9727 .9732 
2.2 | 9757 .9762 |.9767 9771 | 9776 +9780 
2.3 | .980т .9805 | (9809 9812 | .9816 .9820 


2.4 | .9837  .9840 | -9843 9846 | -9849 .9852 
2.5 | .9866 9869 | (9871 9874 | -9876  .9879 
2.6 | .9890  .9892 | .9895  .9897 | 9899 -9901 
2.7 |.9910 9912 | .9914  .9915 | 9917-9919 
2.8 | .9926  .9928 |.9929 9931 |.9932 -9933 
2.9 |.9940  .9941 | .9942 .9943 | 9944 +9945 


3.0 | .9951 
4.0 | .9993 
5.0 | .9999 


* Reprinted abridged from R. A. Fisher and F. Yates, Statistical Tables, published 
by Oliver and Boyd Ltd. by permission of the authors and publishers. The figures in 
the body of the table are values of r corresponding to z-values read from the scales on the 
left and top of the table. 


471 


“Apyeatd qr poynquysip pus o[qu; aq} poynduroo oqa изза ‘O ugor jo uorssmuied Aq pue suog pur Хәпдү ugof јо 
Ásojy1mnoo Aq “FET JIOA MƏN ‘sanbruyoay, puawaimnsna JW puo js2], “ио1рәә]2$ pouosieg “T "qp 'exrpuioq[, шолу peonpoidew „ 


Or 80 90 30 z0 0 z0—10—90— 60— TI- #1— 91— 15 +0—22— 0є— $$— 98— И 36— z9— 99— | OF 
St TI 80 90 0 20 О 20— 0— /0— 60— II— VI— 91—61— 22— sc—82— ии дю | sc 
St Si TI 80 90 FO 20 О 20— $0— 10— 60— 21— 71— ZL— GI— = o 62— tg— w= OF— SP— T9— o9— | 
ar st 2 TE 60. 90 30 c0 0 миша лек 12 08— Ур— St— РР 6v— $9— c9— | Fe 
or Zi Sb et п 60 20 ю 20 О эюшоа- SI— SI- Ig- 55— S6— 5 /&— zr- SV— 99— s9— | ze 


12 в n 9r et и 60 20 $0 c0 0 Z0- S0— 80- 01— &1— 91— 61— 25— 95— 0g- ©є— OF— 9- ss— 29— | 08 
$6 15. м. LES (OE tl If 60 20 70 го о ©0— €0— 80— 11— £I— М- 05— $z— 8@— v£— BE— рр £S— 19- | 85 
95 7с 25 0 SI тя а 60 2 so c0 0 50— 90— 80— 11— Ф1— 81— IZ— 95— 0£— 9g— 37 IS— 6S— | 95 
85 92 $c 5 оё эт тя а OI 80 со £0 0 £0— 90- 80- ZI— SI— 61— £z- 85— ££— OF— OS— 4s- | 55 
we 66 4 St € B GE M St а о 80 90 о о £0— 90— 60— cI— 91— 1%- Sc— 18- 88- 8Р— c9— | zz 


ee 18 65 25 95 т 5 61 л SI tt її 80 90 £0 0 £0— 90— OI— £I— 81— #2— 6z— 9£— 9Р— gc— 05 
95 FE 26 Of sc 45 © t5 OF SI эт £1 п 80 90 £0 0 £0— /0— И- SI— 02— 9z— ££- EF— Ig— SI 
68 28 66 58 Te oe Sz 9 ҥ IB єл ти а 60 90 £0 о £0— 20— &1— М- £z- 0£— 0Р— бр | ot 
СР OF 88 98 в t£ 18 (62 15 сё с 0 ® SI а or 40 £0 0 ¥O— 80- £I— 6I— 95— 26— OF | FT 
Sr ЕР 1р 08 8€ 96 Fe сє 06 85 9 æ E GE OL eI п 20 жю | T0— 60— SI— #5— $£— gt- | zt 


SHIP SE. (SFr E OF S£ oc FE ze 08 8с 95 £c 1 SI ст а 80 ғ 0 90— 11— 61— 02— OF— | ot 
23 19 GF i SF ҮР Gh OP 3 1 S£ ææ 0 85 Sc © Oc м Є 60 so о 90— Ф1— $2— eg— | so 
9с с #5 gzs 05 Ska o "SFP ДЕ OF 38 98 ££ 1 65 95 € 6 ST и 90 0 $0— 61— 0£— | 90 
10 09: 89. 48 98 3 $8 15 GF Sr oF th cb OF 8 95 t£ Of 95 tc тя 80 o Є1— gz- | +0 
89 29 09 t9 $9 29 19 6 89 96 ss £9 19 05 SF ӨР ЕР OF ЈЕ в 05 © 61 а о 11— | 20 
eL vL Th 0) 69 89 19 99 сә £9 29 19 6 Zs < 55 19 6F 9 £F or se of = по 10 


09 180 DP ФУ CP ОР БЕ СОБУ de Up SG 0с "py SE "09 "SL "OT эг ат OL. 90/90 YO. (Z0 0 


9[qvH9A snonurjuoo am, uo 489440] 8411098 3090 Jed Jz әң} ur eossooons 30 uoniodoig 


ЭЧЕНСА snonutjuoo aq uo 3sou3rq 2011005 4u22 124 7 әчү ut sassooons jo uorodoiqt 


5592206 jo зиоцлоЧол{ пәлі) ©} бшриосѕә оо иоцојпаод 
SEPUDAIG [DUON р ш UOHD[9.H0) уо зиерщео) jueuow-pnpoJg eur jo sen|pA EY; зо QOL ү „у 319У1 


472 


MA a EE E A EE EEE eee 


fL— tL— pL- SL— 9L— 


$9— 69— OL— 12— 22= 
19— 29— £9— $9— 99— 
99— /6— 69— 09— 19— 
39- $9— SS- 99— 85— 
ЗР 09— T9— £9— y9— 
©р— ор ВР— 67 10— 
zb- ЕР— sh- 2p— 8Р— 
6£— 0Р— 57 ҮР %- 
9Е— 8£— 66— 1Р— ЕР— 
$£— se— /©— 62— ОР— 
Te- ££— še- 96— 8— 
Sz— 08— c£— T£— 9Е— 
9:— 85— 0£— ze- £e- 
gz- 95— 25— 6z- 18— 
tz- ez- sz- 12— 60— 
6L— 12— g- 95- 12— 
LI- 61 —Iz— 85- 9%- 
st- LI— 61— 13- &-= 
$I— SI— 91— 81— 04— 
ót- 21— vI— 91— 81— 
80— 01— 21— +1— 91— 
90— 80— 01— 21— м- 
¥0— 90— 80— 01— 21— 
£0— }0— 90- 80— 01— 
0  c0— +0— 90— 80— 
zo 0 zo- $0— 90— 
30 20 0 z0- #0- 
90 $0 шо 20- 
80 90 70 20 0 


18— 28— Z8- £8- £8- 
LL— 8L— 61— 08— 08- 
£L— $L— SL- 91— 
89— OL— IL- tL— £L— 


$9— 99— 89— 69— 02— 
£9— $9— c9— /9— $9— 
09— 19— £9— Ф9— 99— 
18— 69— 09— c9— £9— 
€c— 16— 89— 09— 19— 
£9— j6— 99— 89— 09— 
їб— c9— $6— 00— /6— 
6F— 0S— @0— T9— 99— 
9р— 87— OS— cc— Fo— 
РР 9Р— ВР OS— zc— 


Бе РК 9Р— 8Рр— OS— 


OF— ©Р— +h- 9Р— 6Р— 


88— OF— 2Р— Sh- 45 
18— 68— 1Р— £h- Sh- 
$8— ze- 6£— 1Р— £h- 
e8— S£— Le- 6£- zh- 
18— e£— СЕ ze- 0— 
62— 1£— ££— 9£— se— 
12— 62— &8- $£— 9£— 
cz- 1&— 0£— 25- +Е— 
£z- 9¢— 80— 0£— £e- 
IZ— £z- 9%- S2— 18— 
61— IZ- #2— 95- 60— 
и- 6l- z= м- z- 
<т- ZI- 0z— 22- <- 
£I- SI— SI— 0c— £- 


78— S8— 98— 18— /8— 
18— c8— €8— $8— ©8— 
LL— SL— 08— 18— c8— 
$4— 9L— LL- BL— 08— 
ZL- SL- SL— 91— LL— 


OL— 14— &— $2— 91— 
19— 69— 01— ZŁ- #2— 
€9— 19— $9— 0/— zL— 
£9— €9— 19— 89— 02— 
19— £9— <9- 29- 69— 


09— 19— £9— $9- 29— 
19— 09— 19— £9— 99— 
99— $9— 09— c9— 19— 
$9— 99— 86— 09— 59-— 
zs— ¥9— 19— 6S— 19— 


Ie— £e— ¢— 19— 09— 
6$— IS— £9— 99— 89— 
1$— 6F— £9— $9— /6— 
ӨР SF— 09— gc— Sg- 
"= RIP I9 $9— 


z+- < LH- OS— ZS- 
ОР— £t- Sh- 8Р— 19— 
6£— Th- ФР— Z- 6Р— 
18— 68— 25- St- 8- 
se- $£— OF— ЄР— 9Р— 


££— 9£— 6£— ©Р— Sh- 
18— РЄ /©— 07- #Р— 
65— c£— €£— $e- It- 
10— 0£— ££— 92- 6£— 
95— 85— 18— FE— 86— 


88— 68— 06— 16— 56- £6— 
98— 28- 88— 06— 16— zó— 
£8— Ф8— 98— 88— 06— 16— 
18— z8— 38— 98— 85— 06— 
65— I8— ©8— $8— /8— 68— 


1L— 6Ł— I8— £8— 98— 88— 
91— LL— 08— 28— c8— 18— 
¥L— 9L- 81— 18— 18— 18- 
ZL- SL— LL— 08— £8— 98— 
IL— £L— 92— 81— {8— $8- 


02- ZŁ- T2— LL— 18— 18- 
89- 05— $2- 95— 08— £8- 
19— 69— L— SL- 08— £8- 
$9— $9— IL- #L— 6L- 28- 
¥9— 99— 0/— £L- 8L- 38- 


£9— €9— 89— Z2- /5— 18- 
19— }9— 29— 1/- 9/- 08- 
09— £9— 99— 02— S2— 6/— 
89— 19— S9— 69— #1— 8/— 
19— 09— #9— 89- £L- 81— 


99— 69— 29— 19— Е2— 11— 
¥S— $9— 19— 99— z4— 92— 
£9— 99— 09— $9— 12— SL- 
19— 60— 69— £9— 04— 32— 
Os— #9— /5— 29— 69- £4- 


37— 20— 96 19— $9— 21— 
1*— 19— €€— 09— /9— 2/— 
©ў— 6y— ®0— 89— 99— 1/— 
57 L— zS- 19— Y9— 0/— 
1ў— Sh- 09— 99— £9— 69— 


8358888 95888 


аакка 
o|quteA snonunuoo ә} uo 152/40] 31008 3900 290 26 9q1 п 890990918 JO worodorg 


82888 


8338s 


473 


62. 99; cv (09. 1$ cR ^ vk 0р Jt се $9 Ie «Gade c 005 т St Of ип GI [04 
09 Zq 9. 15 би 2х ЭУ 5 6 4 6 t£ 18 65 25 $ с 05 SI 9I SI 86 
82 tL 69 c9 19 99 sg $9 OS ВР oy £P IP 2E se €€ 1 6 4 SG 55 16 вс л 98. 
£9 09 26 + £9 6F iy Sb £P IF 68 ze Se $9 18 6 25 © $5 1 6I +e 
+9 19 85 98 3 в 6$ хх Ok © ¿e S% S Te 65 16 6 #5 1 eg 


18 № GL 89 €9 9 09 10 SF £0 IS GF OF ФР Zk OF SP Jt Ct єє Ie 6c le cc gc og 
58° 81 £4 02 99 *9 19 69 29 79 cs 0S ЗР OF ТР ce OF 6 le < t? 18 6 22 95 85 
58 64 м IL 89 сә £9 09 85 95 *9 cS OS SP 9р P Cb 1» 960 & $$ $5 c£ 05 8 95 
t8 03 SL 52 69 49 79 29 09 89 9s 9 25 OS ВР oF SP cr п 6t 218 98 FE 6 oe Lud 
8 08 95 €2 00 89 99 t9 19 09 29 99 $8 cS 09 Gh LP Sh ЄР ck OF 8E 9% FE tt ec 


38 18 2L FL cL OL 19 %9 £9 19 09 219 99 FS сс I9 6 LF OF Р ce OP 6c ze se | o 
S8 8 8 9L tL 1 69 19 сә № 19 09 SS 99 19 с IS GF SP /Р ch Sh 6 ВЕ | SI 
98 #8 08 1L SL ZL OL 89 19 S9 £9 19 09 8S 10 сс єс cS os GF 2b Sh FF cb oF | o 
18 Y8 18 SL 92 VL ZL OL S9 19 S9 t9 c9 00 69 /$ 99 FS єс IS 0S SF ZIP Sb cr | 9: 
з 98 08 08 42 92 EL CL OL 60 19 99 $9 £ 19 09 88 Z9 SS #5 ас If 6 Вр 9 | ct 
$8 09 8 18 6L ш OL FL cL IL OL 89 19 S9 i9 9 I9 09 ss us ос ¥9 єс тс ос | OI 
68 18 38 c8 18 à 0L Ш 9L SL tL GL OL 69 89 99 $9 тә ғә 19 09 бс 85 99 sc | so 
00 88 98 38 £8 18 08 SL AL 94 FL £L CL Th OL 89 № 99 ә ғә гә 19 09 6s 29- | 90 
16 06 88 98 318 t8 c8 18 08 SL Ш 9L GL FL SL cL IL OL 69 89 29 99 #9 ғә c9 | FO 
56 16 06 88 28 98 c8 18 t9 в 18 08 08 6L SL ZŁ 9L CL FL EL £L ZL TL OL 69 | co 
26 56 16 06 68 88 2 5 98 SB $9 t9 t9 c9 @ 18 08 GL SL BL LL OL SL FL ғ | 10 
66 86 96 $6 26 06 88 98 78 ce 08 SL 9. FL ZL OL 89 99 т c9 09 S9 99 зс єс 


Тел впопщуучоә eq; uo 5940 914099 4120 194 /7 Q} ut sossooong. jo uorodoip 


9|qvi:vA попа! оо eq; uo 594814 8011055 320 194 ZZ 943 ШІ sassaoons Jo uoriodo1q 

———————ÓÀ—rXw—— Dn 
(Репицио)) sse»ong jo suou1odoag иэл!=) o Burpuodseu05 

uoyojndod эзомол! |DUJON D ш иоцојәмоо jo шәощәоо jJusuow-»npoad эц jo sen[pA eur jo әјаоџ У ШХ 319ӯ1 


474 


—— ———————————————————————— 


t9— S9— 15- 69— 


айдо 


14 


II— $5- 0£— S£— 
©1— 61— $5- 


о 
er 


о 
80 
я 


er 
© 
95 
og 
t£ 


98 
88 
oF 
[4 
44 


9r 
87 
er 


Som yp 
o 90— 
90 0 
и $0 
st 60 
61 £I 
© м 
95 05 
65 ez 
18 $$ 
t£ 85 
95 08 
88 5 
0 se 
oe 1 
ЎР 88 
st OF: 
я + 
т 
09 S 
6 Ш 
es 6 
s9 10 
99 25 
28 eg 
63° ise 
09 98 
19 89 


o= 


g= 
T£— 


ӨР Ie 


4£— 


9- 
9£— 
65— 
ez- 


87- 


£9— $9— 99— 
99— 89— 6S— 
ВЕ 769316 
Zb— ЎР S$— 
1£— S£— 07— 
ze— V£— oe- 
85 —0£ —55- 
St— 15— 66— 
15— #5- 90— 
81— 02— 5- 
St a SL 
Si gie 
Е 0 1 
10— 60— 21— 
$0— 10— 60— 
£0— +0— 10- 
о 20 ?0- 
zo 0 50-— 
70 c0 0 
40 #0 20 
60 90 70 
It 60 90 
ти 80 
теа TII 
4t ST SI 
т м SI 
16 6f л 
$5 16 6 
$ © 15 
5 S6 6 


0r— 


+- 
A 
8т- 


69— OL— 12— gl- 
T9— 99— 19— 
18— 8S— 09— 
tS— £9— ss— 
У 


tke 


9z— 
ez- 
o= 


£b— S$— Li- 

—8e —6£ —1Р— ©Р— 
££— $£— 9£— S£— Or- 
0£— 1£— ££— S£— ie- 
щ-— 85- 0£— c£— v£— 


15— 65- 1£— 
St— 10— 65- 
00— %0— 92 
05— zz- z- 
І 61— 12— 
SI— 2I— 61— 
$I-— SI— 21— 
quevpte | oa 
$0— II— £I— 
90— $0— и- 
T0— 90— 80- 
©0— *0— 90— 
о zo- $0— 
со 0 20- 
ою 20 0 

90 70 20 

80 90 +0 

от $0 90 

ct OI 80 

п а о 


66 
86 
96 
#6 
56 


06 
88 
98 
78 


58 


08 
8L 
92 
+L 
eL 


02 
89 
99 
?9 
29 


83 
[дил snonurjuoo oq; uo 399/40] 2011098 1099 sad 15 әчу ut sossooons jo uorjodo1q. 


93 
73 
53 


08 
87 
oF 
L4 
oF 


475 


1.0 гт 1.0 
| ж 
8 | a 8 
/ | 
6 и j 6 
4 7- VATI 4 
| И Hl 
2 
2 ВЕУ "ii 
Z [| 
0 0 
Ji 
—.2 ] ДУ —.2 
] e 
-4 = 
56 Z —.6 
-.8 / —.8 
—1.0 TT О —1.0 
—10 -8 —.6 —4 = 0 ee 4 6 8 1.0 
Scale of r 


CHART XIV Confidence belt for the correlation coefficient p when a = .05. 
Numbers on the curves indicate sample size. (Reproduced with the permission 
of ES. Pearson from David, F.N., Tables of the Correlation Coefficient, The 
Biometrika Office, London.) 


476 


o 
o 
О 


Estimate of p 
o 
a 
я 


o 
ъ 
Ў 


| 


0.2 T 


0.1 


—— 


0.0 
0.5 0.6 0.7 0.8 0.9 1.0 


CHART ХУ Chart for estimating the correlation coefficient. (From Frederick 
Mosteller, “On Some Useful ‘Inefficient’ Statistics,” Annals of Mathematical Sta- 
tistics, 17 (1946), p. 405. Used with the permission of the author and the editor 
of Annals.) The use of this chart is discussed on page 420. 


477 


TABLE XVI Percentile Values of the Rank Order Correlation Coefficient in 
Samples of N Cases from an Uncorrelated Population 


Вю Rn Re Re Ru Въ Re Ел Ёз Вл 


1.00 1.00 
.80 .80 90 -90 .90 .90 .90 .90 .90 1.00 
717 77 Mis 77 83 83 .83 .89 94 1.00 
.68 .68 71 71 75 75 79 82 .86 .89 
.63 64 67 67 .69 yfi 44 76 81 .86 


-Rw -Ro -Ræ -Ro — В -Ro -Ru —Ёъ – Ко — Ro 


со ы Ф ©л н» ы 


TABLE XVII Estimates of the Mean of a Normal 
Population Based on Percentiles.* 


Estimate Efficiency 
Xu 64 
3(X » + Xs) 81 
SX + Ху + X.) 88 
ЖХ. + Хо + Хз» + Xas) 91 
Хо + Xo + Хы + Хо + Xo) .93 


* Based on Mosteller, F., “Оп some useful inefficient statistics," Annals of Mathe- 
matical Statistics, 17 (1946), pp. 377-408. 


TABLE XVIII Estimates of the Standard Deviation of a Normal 
Population Based on Percentiles.* 


es 


Estimate Efficiency 
839(X 93 — Хо) 65 
ATIQX + Xo — X10 Хо) 76 


-120(Х.в + Хо + Xa — X — Хо Xo) .87 


* Based on Mosteller, F., “On some useful inefficient statistics,” Annals of Mathe- 
matical Statistics, 17 (1946), pp. 377-408. 


478 


A. TABLE XIX * Transformation of a Proportion (p) to Radians ($) [ф = 2 arcsin Vp] 


p $ p Ф p $ p Ф p $ p $ 
.001  .0633 | .036  .3818 | .26 1.0701 | .61 1.7826 | .951 2.6952 | .986 2.9044 
002 .0895 | .037 .3871 | .27 1.0928 | .62 1.8132 | .952 2.6998 | .987 2.9131 
.003 .1096 | .038  .3924 | .28 1.1152 | .63 1.8338 | .953 2.7045 | .988 2.9221 
004 .1266 | .039 .3976 | .29 1.1374 | .64 1.8546 | .954 2.7093 | .989 2.9315 
.005  .1415 | .040 .4027 | .30 1.1593 | .65 1.8755 | .955 2.7141 | .990 2.9413 
.006 — .1551 | .041 .4078 | .31 1.1810 | .66 1.8965 | .956 2.7189 | .991 2.9516 
007 .1675 | .042  .4128 | .32 1.2025 | .67 1.9177 | .957 2.7238 | .992 2.9625 
008 .1791 | .043 .4178 | .33 1.2239 | .68 1.9391 | .958 2.7288 | .993 2.9741 
009  .1900 | .044 4227 | .34 1.2451 | .69 1.9606 | .959 2.7338 | .994 2.9865 
010 .2003 | .045  .4275 | .35 1.2661 | .70 1.9823 | .960 2.7389 | .995 3.0001 
O11" .2101 | .046 .36 1.2870 | .71 2.0042 | .961 2.7440 | .996 3.0150 
012  .2195 | .047 37 1.3078 | .72 2.0264 | .962 2.7492 | .997 3.0320 
1013 .2285 | .048 .38 1.3284 | .73 2.0488 | .963 2.7545 | .998 3.0521 
014 .2372 | .049 39 1.3490 | .74 2.0715 | .964 2.7598 | .999 3.0783 
015 .2456 | .050 40 1.3694 | .75 2.0944 | .965 2.7652 
.016 .2537 | .06 41 1.3898 | .76 2.1177 | .966 2.7707 
017 .2615 | .07 .42 1.4101 | .77 2.1412 | .967 2.7762 
018 2691 | .08 43 1.4303 | .78 2.1652 | .968 2.7819 
019 09 -44 1.4505 | .79 2.1895 | .969 2.7876 
.020 10 E 1.4706 | .80 2.2143 | .970 2.7934 
021 11 46 1.4907 | .81 2.2395 | .971 2.7993 
022 .2978 | .12 AT 1.5108 | .82 2.2653 | .972 2.8053 
.023 .3045 | .13 48 1.5308 | .83 2.2916 | .973 2.8115 
024 — 3111 | .14 49 1.5508 | .84 2.3186 | .974 2.8177 
025 .3176 | .15 .50 1.5708 | .85 2.3462 | .975 2.8240 
.026 .3239 | .16 .51 1.5908 | .86 2.3746 | .976 2.8305 
.027  .3301 | .17 52 1.6108 | .87 2.4037 | .977 2.8371 
028 .3368 | .18 .53 1.0308 | .88 2.4341 | .978 2.8438 
029 .3423 | .19 54 1.6509 | .89 2.4655 | .979 2.8507 
030 .3482 | .20 55 1.6710 | .90 2.4981 | .980 2.8578 
031 3540 | .21 56 91 2.5319 | .981 2.8650 | * Discussion 
032 .3597 | .22 .57 .92 2.5681 | .982 2.8725 | of the use of 
.033 .3654 | .23 58 .93 2.6062 | .983 2.8801 | this table is 
034 .3709 | .24 59 94 2.6467 | .984 2.8879 | found on 
035 — .3764 | .25 .60 -950 2.6906 | .985 2.8960 | page 423 
B. Values of фо and фу for Samples of 10 to 50 * 

Sample size Фо Ф. Sample size фо Qı 
10 3176 2.8240 30 .1828 2.9588 
11 .3027 2.8389 31 .1798 2.9617 
12 .2897 2.8519 32 A770 2.9646 
13 .2782 2.8633 33 1743 2.9673 
14 .2681 2.8735 34 SUIT 2.9699 
15 2589 2.8827 35 1692 2.9724 
16 .2507 2.8909 36 .1669 2.9747 
17 .2431 2.8985 37 ‚1646 29770 
18 .2363 2.9053 38 .1624 2.9792 
19 .2299 2.9117 39 .1603 2.9813 
20 .2241 2.9175 40 .1583 2.9833 
21 ‚2187 2.9229 41 .1563 2.9853 
22 2136 2.9280 42 11545 2.9871 
23 .2089 2.9327 43 .1526 2.9889 
24 2045 2.9371 44 -1509 2.9907 
25 2003 2.9413 45 1492 2.9924 
26 .1964 2.9452 46 -1476 2.9940 
27 11927 2.9488 47 1460 2.9956 
28 .1893 2.9523 48 .1445 2.9971 
29 1860 2.9556 49 1430 2.9986 

* 50 1415 3.0001 


к et ЕБЕ ЦЕРЕС аазы 

* From Techniques of Statistical Analysis by permission of the author of Chapter 16, 
Churchill Eisenhart, and the publisher, the McGraw-Hill Book Co. 
479 


“6006-08 үеппвр{ OULY ‘8700499 purunuo)) burug, y sof sanonu, 


0$ | 65 08 


Г Surpoip fo тиәшәэаолішт шолу 


66 | FE 65 ez 
86 |9 ФЕ oc 82 
15 |88 98 FE 65 215, 
95 | OF $ Le Fe 6c 92 
35 Lhe 0F 88: LE "v9 62 ec 
а oy ТР OF 68 48 FE OF TZ 
$6 | & cr Ib OF 6€ Је GE og e6 
55 | Sh ЕР ЄР Gv Ih 6g Le SE Ot 22 
4 Sh Sh th ЄР СР ТР 68 86 GE OP 1б 
05 oF GP ОР th Sh G во Se Ge Og 05 
61 ly 9v ӨР Sh ЎР TP Sh Ih OF S? Of TE 6T 
8I Sf iV Le 9v OF SP HH Sh G OF $9 ОЁ те SI 
LI | 6b SF Sb Le Le OF OP Tb SF Cb P 6g 9 Te 21 
9r 09 6F 6h Sv Sv Lh OF SF Gb ФР CP IP 62 OF Te or 
ST | 09 09 09 6F Gh 8h 7 Le OF OP FF £P IF OF LE cg вт 
YI 19 19 09 09 09 Gh 8h St LF OF GP FF SF Z OF LE c£ УТ 
SI 53 66 If 19 09 Of 6F 6 St SF Le OF SF FF Ze OF 8$ c£ er 
[41 $3 $9 09 09 19 175 19 09 бр GF Sv LP Ob Gh тр £P IF 88 ££ ZI 
IT |9 #9 £9 $€ 05 00 05 19 Тс Of Gr GF SF Le OF GF єр TE SP ce ТТ 
от 99 975 ҒО YO Sf бб $9 CG CO IG 1с 09 бр GF SP Le Gh FH ev 68 vt OL 
6 33 99 SG 99 79 9 FS EF ЄС 09 Z9 IO IG 06 6b BF LE OF GF ЄР OF +e 6 
8 29 28 90 99 99 99 GG 99 FG м £9 S9 075 TO I9 06 бр SF LF Sr ЄР OP ЧЕ 8 
Z 8G 83 ла лс Ја 99 90 99 99 GG GG $0 FE сс сб ze Te Qe 67 SP ОР Fr IF се D 
9 68 63 69 89 89 89 49 43 JG 99 96 Өс GG FF $9 єє ес zo 19 09 6 LF Gb Я 9g 9 
9; 09 09 09 09 69 69 68 69 SC 89 86 ZG 2G 90 9с сс сс FE єс eo IS OS SF OF SF ze в 

v 59 c9 59 I9 I9 19 I9 09 09 09 66 6€ 69 8G 86 19 16 ge cc 90 +G ЄС 06 Of Sp Gp | у 

£ 79 #9 £9 £9 t9 £9 £9 Z9 c9 c9 c9 I9 I9 09 09 09 69 69 se 7e 29 99 Сс $9 сс (6 g 
[4 99 99 99 99 99 99 G9 9 GO 9 м 49 v9 49 €9 9 Z9 ZƏ Z9 T 9 09 09 63 33 2с sgj z 

T I4 Та 14 T4 14 IL 02 OL OL OL OL 69 69 69 69 89 89 $9 29 7/9 99 99 $9 99 $9 | T 

08 65 8% 46 95 SG FE t6 GG IZ OF 61 SI ZI 9I «т FI fI GI тт OL 6 uo Mig 
Oey yuey 


paguey ѕиовләд jo ioqumN 


SS X 


$91055 PADPUDIS ©} Sjupy jo uoupunojsuDi] XX зтауу 


480 


TABLE ХХ! Squares, Square Roots, and Reciprocals 


n т? Vn ViOn 1/n n n? Vn V10n 1/n 
1 1 | 1.000 3.162 | 1.00000 51 2601 7.141 | 22.583 01961 
2 4| L414 4.472 .50000 52 2704 7.211 | 22.804 01923 
3 9 | 1.732 5.477 .33333 53 2809 7.280 | 23.022 | .01887 
4 16 | 2.000 6.325 .25000 54 2916 7.348 | 23.238 01852 
5 25 | 2.236 7.071 .20000 55 3025 7.416 | 23.452 | .01818 
6 36 | 2.449 7.746 .16667 56 3136 7.483 | 23.664 | .01786 
7 49 | 2.646 8.367 .14286 57 3249 7.550 | 23.875 | .01754 
8 64 | 2.828 8.944 .12500 58 3364 7.616 | 24.083 | .01724 
9 81 | 3.000 9.487 11111 59 3481 7.681 | 24.290 | .01695 
10 100 | 8.162 | 10.000 .10000 60 3600 7.746 | 24.495 | .01667 
11 121 | 3.317 | 10.488 .09091 61 3721 7.810 | 24.698 | .01639 
12 144 | 3.464 | 10.954 .08333 62 3844 7.874 | 24.900 01613. 
13 169 | 3.606 | 11.402 .07692 63 3969 7.937 | 25.100 | .01587 
14 196 | 3.742 | 11.832 .07143 64 4096 8.000 | 25.298 01562 
15 225 | 3.873 | 12.247 .06667 65 4225 8.062 | 25.495 01588 
16 256 | 4.000 | 12.649 :06250 66 4356 8.124 | 25.690 | .01515 
17 289 | 4.123 | 13.038 :05882 67 4489 8.185 | 25.884 | .01493 
18 324 | 4.243 | 13.416 05556 68 4624 8.246 | 26.077 | .01471 
19 361 | 4.359 | 13.784 05263 69 A761 8.307 | 26.268 | .01449 
20 400 | 4.472 | 14.142 .05000 70 4900 8.367 | 26.458 | .01429 
21 441 | 4.583 | 14.491 .04762 71 5041 8.426 | 26.646 | .01408 
22 484 | 4.690 | 14.832 .04545 72 5184 8.485 | 26.833 01389 
23 529 | 4.796 | 15.166 .04348 73 5329 8.544 | 27.019 | .01370 
24 576 | 4.899 | 15.492 .04167 74 5476 8.602 | 27.203 01351 
25 625 | 5.000 | 15.811 04000 75 5625 8.660 | 27.386 | .01333 
26 676 | 5.099 | 16.125 .03846 76 5776 8.718 | 27.568 | .01316 
27 729 | 5.196 | 16.432 .03704 77 5929 8.775 | 27.749 | .01299 
28 784 | 5.292 | 16.733 03571 78 6084 8.832 | 27.928 | .01282 
29 841 | 5.385 | 17.029 .03448 79 6241 8.888 | 28.107 01266 
30 900 | 5.477 | 17.321 .03333 80 6400 8.944 | 28.284 | .01250 
31 961 | 5.568 | 17.607 .03226 81 6561 9.000 | 28.460 | .01235 
32 | 1024 | 5.657 | 17.889 .03125 82 6724 9.055 | 28.636 | .01220 
33 | 1089 | 5.745 | 18.106 .03030 83 6889 9.110 | 28.810 | .01205 
34 | 1156 | 5.831 18.439 .02941 84 7056 9.165 | 28.983 01190 
35 | 1225 | 5.016 | 18.708 .02857 85 7225 9.220 | 29.155 | .01176 
36 | 1296 | 6.000 | 18.974 .02778 86 7396 9.274 | 29.326 | .01163 
87 | 1369 | 6.083 | 19.235 .02703 87 7569 9.327 | 29.496 | .01149 
98 | 1444 | 6.164 | 19.494 .02632 88 7744 9.381 | 29.665 | .01136 
39 | 1521 | 6.245 | 19.748 .02564 89 7921 9.434 | 29.833 01124 
40 | 1600 | 6.325 | 20.000 .02500 90 8100 9.487 | 30.000 01111 
41 | 1681 | 6.403 | 20.248 .02439 91 8281 9.539 | 30.166 | .01099 
42 | 1764 | 6.481 | 20.494 .02381 92 8464 9.592 | 30.332 | .01087 
43 | 1849 | 6.557 | 20.736 .02326 93 8649 9.644 | 30.496 | .01075 
44 | 1936 | 6.633 | 20.976 .02273 94 8836 9.695 | 30.659 | .01064 
45 | 2025 | 6.708 | 21.213 .02222 95 9025 9.747 | 30.822 | .01053 
46 | 2116 | 6.782 | 21.448 02174 96 9216 9.798 | 30.984 | .01042 
47 | 2209 | 6.856 | 21.679 -02128 97 9409 9.849 | 31.145 | .01031 
48 | 2304 | 6.928 | 21.909 .02083 98 9604 9.899 | 31.305 | .01020 
49 | 2401 | 7.000 | 22.136 .02041 99 9801 9.950 | 31.464 | .01010 
50 | 2500 | 7.071 | 22.361 .02000 100 | 10000 | 10.000 | 31.623 | .01000 
jE ee erc tae ааа UE d ado 


2148 2175 2201 


7513 7520 7528 
799 7597 Tw | из 


Sasa 8228 8935 | 8241 
Bato | Вә8у воз азуу | %у% 


525 ШҮП 


855 ву 


9165 
ony 


HOHER 


"uojestuaod po UIE 4X9} sq UE posn атар 


"203001«] “124933 "S “HM “IW “O'A "uo3Surqss A “потевтагшогу 
әоләшшогу 9784810} U] 913 Jo 5013513935; pus soruouoo;r 310dsuvi] Jo nang eq? Aq peredaid s}IStp uropuvi (ууу'сот JO 143 2284-08 eq шолу пчел, 


$9676 
отевт 
19855 
56377 
51858 


70189 
67298 
19679 
29199 
61207 


20206 
89868 
5967 
55806 
88184 


59679 
17699 
10858 
29621 
08857 


$1979 
6LE9T 
65989 
30966 
00206 


GD 


ee eee ees 
ѕләдшпм uopupy 


02209 
88198 
96281 
60696 
70666 


35995 
99198 
17156 
77630 
T0994 


19985 
32665 
$1908 
$0268 
OT9T6 


£698T 
98990 
80957 
FET 
$6877 


9210 
89226 
87559 
31968 
16216 


(вт) 


62998 
78920 
19179 
16691 
99919 


60S6T 
SALT 
91610 
126144 
06689 


110224 
38505 
99122 
11906 
06998 


349% 
1767? 
$0678 
69016 
35818 


22690 
11674 
©6961 
лет 
02966 


(et) 


98358 
49801 
ZIOLI 
PLZ 
15708 


91688 
85176 
erie 
35879 
09602 


51865 
$0198 
STIS 
PEZZI 
L¥STO 


6EF9T 


81989 
З5Ъ8Т 


ат) 


89919 
[13914 
51690 
97990 
82122 


#019} 
09579 
05098 
66269 
00658 


21698 
89595 
51580 
27805 
96918 


E86LT 


9992т 
97009 


(от) 


1000$ 08219 
13997 86®61 
90858 — 86286 
16183 78895 
616$ 75/70 


19818 2098Т 
$6755 96/86 


88015 59865 
88110 — 61857 
СРУбУ — HI ZG 
35216 98018 
TISIO 76856. 
91008 дәл 
08153 96922 
90106 69880 
89180 00559 
18895 81/578 
71069 69899 
E£IOPP 8/81 
99906 69902 
7896$ SOLIS 
ТРЕТ 2889 
07567 08875 
$9686 50759 
06959 — УбТЯТ 
(6) (8) 


59609 05756 TELGI 19840 $8885 29088 — 88750 $5 
7539Р 95500 18/89 808/8 18918 10026 98868 yc 
стеб 30088 #9861 00575 46960 $9858 GENE 85 
8959, 90895 04047 #0191 15755 56789 FOTES 55 
65506 8910 51/160 ЭРЕРГ 85898 97516 89987 16 
88692 16990 86957 86660 18488 829/6 99020 02 
67188  €IGPI 91555 98989 69597 91689 59159 вт 
POSST 97170 ELZIE 70676 59888 56079 ОТО SI 
Cce99 — 98801 89268 89509 $70759 58815 89850 л 
93/09 S091 2921 69518 12819 99/51 98019 9r 
4992} QIGET 88542 82180 SPOTL 98826 61120 ст 
68919 19559 18587 68998 629/8 65119 99Е0Т т 
05506 88728 T1689 1056 9E9ZS 69686 6760 Т 
$993т 97769 9596$ 157850  Q9?c8P 19607 $9989 гт 
совос 98661 26601 92268 18588 8/8469 81680 т 
19888 $8969 09089 886209 57889 20898 91798 or 
0722$  90ISI ESPLT 18001 19989 ZrEFI 61968 6 
$6976 55605 91881 2460 $59790 22616 10896 8 
FOITL  9T0I8 51886 $76669 02799 90652 79966 L 
©0981 86789 99415  Ie2c$ 80011 20690 18622 9 
89709 58/16 15190 99991 19818 91668 029/8 g 
OFF6E 9LE9T 99840 0899 87590 $6086 49157 ». 
6/19т 60879 $689. S9ZZ6 12955 09887 OFTHE © 
686/60 86168 6608 $6898 96995 $1997 89655 5 
61169 97916 27918 11050 98910 TIOSI 0870Т I 
Q (9) [o] [2] (e) @ a) "eo Neurr 


MXX 31971 


484 


058/8 10098 08680. 9087Е 65662 26151 10805 089/8 #8970 71627 6882 58985  $LV6I 61682 09 
32826 55809 54086 PIOI TOME $9997 99894 83279 98750 191» 89889 $071 98894  $cCOIE 69 
97869 18560 98965 81589 FZIGI 04692 29998 16900  ЗЕТУР $2999 2908 6PEPI 50579 98688 89 
9726Т 28809 16716 SOIL 28961 $8098 $5559 69575 FIZZ 00890 98681 6992 90505 84/96 219 
cerry 19601» 85956 LIBEL 08788 56/58 86069 SISI 06829 88555 9086 00698 90098 TENT 99 
66795 98101 65708 89989 2987 08660 26/61 91208 66/89 69090 гї 56455 97688 — 90708 99 
30516 16686 01901  GISPP 6689$ 08/08 сє PPSGl 58188 1609 6666р 29165 $£0490 16729 79 
07087 39196 $75652 ESTES 98505 2/81 EI0I4 $5888 090121 91066 289/8  S6WLp 10198  GIIEZ £9 
68586 $2198 55870 06098 96790 08/96 585 699/6 5608Т 68/90 ©9616 05990 $59618 62981 29 
85550 54452 9916 62969 868128 #6998 09806 95058 ВЕРІС 10796, 18889 $9170 66818  SOFP9I 19 
29998 19556 85655 1919F 5608 сб 53918 91969 66615 556 16588 56705 £6FOI %99сТ 03 
22586 08459  LLLVI 69888 ОРЫЛ 6520 82579 59704 СРСРО 26901 99202 95956 — 62699 51096 [14 
88218 279821 8/0609 6168 29869 16899 87995 67518 SLOPE 57019 88895 59915 85500 29680 87 
00619 95255 OIIFO 165698 98268 52746 T9206 Ф169 «SIFT 9887 95618 GEELSE 219/9 F99 ly 
ЭТУ 85509 9209 00218 ТӨЕР 19999 99/58 06709 25888 19899 56665  v16£8 01559 95/06 oF 
$8969 416909 ZZZřI Тит 68196 11668 69008 28655 79/195 9/89 16820 0860» 91965 51094 

60826 17990 76907 5927 29588 18508 8/8297 SORTI 87658 09889 97519 06690 $1678 09/689 

76299 20100 85854 19961 $9856 32669 56895 02696 91789 61555 £Ly9P 31768 41585 09004 

87569 922221 £9888 £9902 20001 15 95807 68928 ZEOLIT 9/77 69/78 05288 94689 F16FE 

6029} 0889 20170 9999F 9895$ 08870 05786 69699 41/726 — 8910 04579 59568 85940 15786 

5908т 06744 50105 99908 8196р 20919 2/0895 92899 8962 20968 59018 909E 69/459 LZEFT OF 
89705 60998 9FZIZ 99995 EIPS LEIP 11995 550600 7858 01978 1/18, $5876 75968 29126 6g 
82518 09805 68889 57699  GLPGT 89188 25189 99905 88019 $0850 8196  GySSI T8981 £099 88 
16457 $0108 11988 80482 80050  6908I 57905 91190 15669 $7005 1516Р 66606 бс 99641 4g 
TIZeL 99964 09560 08065 $988 — 98999 05266 08865 38598 0090 FEIOS 89615 9692Р LOST6 9g 
09088 27688 88849 5966 8891р 86680 16198 06585 /1©81 #8808 — 80651 22980 81788 £9260 gg 
27380 £9698 ZAPS TIYSE #2269 75799 14558 б1681 8028р /16/9 $0988 88865 87619 92697 FE 
10587 44451 00108 197865 05669 1080F 35980 1995 79815 88681 $6599 9/896 96289 11069 ge 
90088 61642 90899 8609, 19186 19978 98098 55504 25/088 $5508 0/199 16159 73869 92200 eg 
05994 85059 99995  S0S£P 20957 21818 27966 1805 стае 90557 19842 11648 11170 28800 Tg 
81969 $7810 Ог? 50716 89979 00911 60555 54668 07605 99/95 0876 ZITO 81795 T6616 08 
67706 $6975 05999 0/59. $98199 99/97 70689 4156,8 ВРОРР LOPE 55795 699095 $1570 99280 65 
$9586 96096 $8570 09/79 19096 27686 29219 СЕЗБЕ 1200? 52978 55700 79068  Z6g/9 2/00 82 
0829с 85708 50119 69226 69951 37998 95918 89/68 67802 10697  ce9c 98089 16902 92967 45 
35196 9996, #2028 0088Т O8LFI 26292 SLLPl 99999 19958 81872 82296 68870 96554 35918 95 
Gn (en (т) an (от) (6) (8) e (9) (9) Ф) (=) @ а) "Teo Neurr 


485 


TABLE XXIV Pre-registration Scores of 447 College Students on the Cooper- 
ative Test Service English Test (Data supplied by Dr. Irving Lorge) 


ee Se eee 
Subject Score |Subject Score| Subject Score|Subject Score | Subject Score 


1 141 41 133 81 102 121 156 161 093 
2 120 42 146 82 138 122 098 162 094 
3 15 43 138 83 132 123 085 163 119 
4 156 44 138 84 118 124 115 164 139 
5 122 45 128 85 135 125 129 165 166 
6 
7 
8 


128 46 127 86 100 126 143 166 108 

112 47 142 87 111 127 108 167 103 

178 48 173 88 169 | 128 087 168 092 

9 120 49 109 89 124 129 121 169 175 
10 160 5 131 90 134 130 187 170 160 


11 104 51 157 91 135 131 179 171 152 
12 088 52 111 92 180 | 132 140 172 140 
13 100 53 114 93 102 | 133 136 173 177 

54 117 94 117 134 086 174 133 


55 115 95 085 | 135 175 175 111 
16 108 56 155 96 169 | 136 120 176 114 
17 147 57 190 97 127 137 175 177 106 
18 127 58 159 98 128 | 138 133 178 147 
19 156 59 141 99 131 139 107 179 090 
20 201 60 100 100 167 140 119 180 167 


21 131 61 125 101 118 | 141 135 181 156 
22 174 62 116 102 142 142 100 182 130 
23 096 63 148 103 143 143 102 183 143 
24 140 64 169 104 094 | 144 138 184 142 
25 102 65 139 105 17 145 129 185 151 
26 090 66 145 | 106 114 146 134 186 108 
27 177 07 084 107 091 147 145 187 134 
28 125 68 180 | 108 091 148 148 188 120 
20 164 69 139 | 109 120 149 155 189 124 
30 150 70 166 | 110 119 | 150 124 190 176 


31 181 71 159 111 135 151 109 191 170 
32 118 72 126 112 158 | 152 103 192 120 
33 192 73 177 113 138 | 153 113 193 165 
34 169 74 165 114 123 154 135 194 102 
35 117 75 174 115 199 155 117 195 109 
36 152 7 173 116 111 156 123 196 116 
37 176 77 149 117 105 | 157 101 197 145 
38 089 78 173 118 100 158 107 198 136 
39 151 79 093 119 158 159 108 199 ' 099 
40 148 80 118 120 103 160 093 | 200 163 


486 


Subject Score | Subject Seore|Subject Score | Subject Score | Subject Score 


201 108 | 251 098 | 301 130 | 351 095 | 401 068 
202 144 252 104 | 302 140 | 352 185 | 402 078 
203 117 253 128 | 303 113 | 353 095 | 403 063 
204 166 254 131 304 189 | 354 160 | 404 077 
205 115 255 099 | 305 152 | 355 133 | 405 082 
206 089 256 178 | 306 105 | 356 120 | 406 058 
207 128 257 119 | 307 111 357 149 | 407 063 
208 165 258 114 | 308 109 | 358 137 | 408 070 
209 176 259 090 | 309 139 | 359 137 | 409 058 
210 148 | 260 129 | 310 110 | 360 115 | 410 053 


487 


Glossary of Symbols 


The letters X and Y (and Z when a third letter is needed) are here customarily 
used to denote random variables and observations on individuals. The other 
English letters, capital or small, are used as convenience may require to denote 
constants in an equation, or to name groups or categories within groups. Letters 
at the first of the alphabet are used most often for this purpose, Such casual uses 
to which one letter might be put as well as another, will not be included in this 
glossary. Small letters from the middle of the alphabet, especially i and j but 
also h, k, and 1, are used as variable subscripts. (See pages 112 and 197.) 

Parameters are customarily denoted by Greek letters and sample statistics 
by English. However, certain exceptions have been made, as in the use of x? 
to denote a statistic and of F, P, and Q to denote the frequency and proportion 
in a population. 

The use of multiple subscripts has been treated on page 196. The use of sub- 
scripts in multiple and partial correlation and regression is discussed in Chapter 13. 

A bar over any letter indicates the sample mean of the variable denoted by 
the letter. 

Greek letters and English letters are listed separately according to their re- 
spective alphabets. Symbols of operation are listed separately. 


a (1) Constant term in regression equation; (2) observed frequency 
in one class of double dichotomy. 
а; (1) Frequency in ith row and first column of a sample in which 


horizontal trait is dichotomous, or similarly for reversal of rows and 
columns (see page 99); (2) constant term in the regression equation 


for the ith group. 

A (1) Constant term in regression equation; (2) Za;; (3) alternative 
to a stated hypothesis; (4) arbitrary origin. 

b Frequency observed in one class of a double dichotomy. 

bi (1) Frequency in ith row and second column of a sample in which 


horizontal trait is dichotomous (see page 99); (2) regression coeffi- 
cient for the ith group. 


bya Regression coefficient in the sample equation to estimate y from z, 
or Y from X (similarly bzy, bis, etc.). 

bey Analogous to by: except that each variable is expressed as a multiple 
of its standard deviation, ог 5,,5./s,. 

bina Coefficient for the term involving 2» (or X;) in the sample regression 


equation to estimate z; (or X,) from To, 2s, and z4 (or X», Хз, and 
X4). (See Chapter 13 for use of other subscripts.) 


btu Analogous to bis, except that each variable is expressed as a mul- 
tiple of its standard deviation, or bi2.s82/8). 
B (1) Zbi; (2) a term in Bartlett’s test for homogeneity of variance. 


В’ Approximation to B in Bartlett’s test. 


Craw 


Crys 


Crp 


Cao 


Ст 


Cur 


Glossary of Symbols - 489 


(1) Observed frequency in one class of a double dichotomy; (2) the 
number of columns in a layout of rows and columns. 

(1) Coefficient of contingency; (2) a class or category, usually with 
subscripts to indicate which class; (3) the mean square among 
columns as in Chapter 13; (4) sum of squares or sum of products 
as used in Chapter 15; (5) elements of the inverse matrix in multiple 
regression. 

The sum of the squares of deviations from the mean, or Х(Х — X,)? 
for the ith group or category, and similarly for С. (See Chapter 15.) 

k 


Sum of squares between groups, or » М(Х; — X)» and similarly 
1 
for Cy. 


k м 
Sum of squares within groups, or Y Ў (Xia — Х;)°; and similarly 


i=la=1 
for Cy». (See Chapter 15.) 5 
The sum of products Z(X;4 — X;)(Yia — Y;) for the ith group or 


category. (See Chapter 15.) 
k 


Sum of cross products for means, or D NX; — X)(Y; – Y). 
“1 


i= 


(See Chapter 15.) 
k м 
Sum of cross products within groups, or y > (Xia — X)(Yia — Y). 


ї=1а=1 


(See Chapter 15.) 


k № 
Sum of squares for total group, or Ў У (Xia — X)? and similarly 


ї=1а=1 


for Ст. (See Chapter 15.) 
k Ni 
Sum of cross products for total group, or У у (Xia — X)(Yia — Y). 


i=la=1 


(See Chapter 15.) 


C(A«u«B)-—1-—2a A statement made with confidence coefficient 1 — а 


that lies between A and B, and similarly for statements about 
other parameters. 

(1) An error of measurement, as in Chapter 12; (2) ил — ин as 
on page 163; (3) X; — X as on page 128; (4) observed frequency 
in one class of a double dichotomy. In some texts d indicates the 
deviation of a score from its mean but it is not so used here. 
Degrees of freedom. See also n. 

(1) Weighted difference between means as on page 399; (2) difference 
between two scores of same individual as on page 152; (3) maximum 
vertical difference between two cumulative frequency polygons in 
K-S test. 

(1) A mathematical constant approximately equal to 2.718 occurring 
frequently in equations of probability distributions; (2) an error of 
measurement as used in Chapter 12. 

Mean square for error as used in Chapter 14. 


490 - Glossary of Symbols 


Е. Бу 


Mi 


Pi 
P 
P; 


P(X is in C;) 


Correlation ratio in a sample. (In some texts the symbols 7,, and 
qw, are used, but here the latter denote population values of the 
correlation ratio.) 

The expectation of, or expected value of whatever variable is denoted 
within the parenthesis, as E(X), E(X — и)°, E(p), etc. (see page 27). 
Frequency in a sample. 

(1) Frequency in a population; (2) variance ratio in a sample, 
ordinarily used with a subseript to denote the probability under 
some hypothesis of obtaining a smaller value by chance. 

The 95th percentile of the variance ratio distribution, and similarly 
for other subscripts. 

The ratio of the largest to the smallest variance (see page 192). 
Test statistic for comparing samples by means of the sums of ranks. 
The hypothesis that the mean of a population is equal to A, and 
similarly for other hypotheses. 

A variable subscript. 

(1) A variable subscript (see pages 112 and 197); (2) the width of 
a class interval. 

A variable subscript. 

(1) The number of groups, classes, variables, strata, and the like; 
(2) a variable subscript. 

Weight given ith variable, used on page 356. 

Weight given ith variable, used on page 356. 

(1) The number of terms accumulated in one tail of the binomial dis- 
tribution as used in Table IV; (2) the number of items in a test; 
(3) the number of clusters in a cluster sample, as used on page 176. 
(1) The number of cases in a finite population (see page 167); (2) the 
number of clusters in a finite population (see page 176). 

(1) The number of degrees of freedom; (2) the number of cases in 
a subclass (occasional use). 

The number of degrees of freedom for the numerator mean square of 
a variance ratio. 

The number of degrees of freedom for the denominator mean square 
of a variance ratio. 

The number of cases in a sample, sometimes used with a subscript 
to indicate a subsample. 

Factorial N (see page 18). 


The number of combinations of N things taken r at a time. 


The proportion of elements in one class of a dichotomous sample. 
The proportion of elements in ith class of a sample. 
The proportion of elements in one class of a dichotomous population. 
The proportion of elements in ith class of a population. 

Probability that X is in the ith class. (See page 13 for variants.) 


P(X: is in C; | Xiis in Cj) Probability that X; will fall in the jth class if Xi 


falls in the ith class. (See page 14 for variants of this.) 


P(A < 2 < B) Probability that x will fall between the fixed values А and В. 


q 


р 


R; 
RaR 


Rey Rin 


Glossary of Symbols - 491 


1-Р 

(1) Coefficient of correlation; (2) the number of rows in a layout 
of rows and columns. 

Biserial correlation. 

Point biserial correlation. 

Tetrachoric correlation. 

Coefficient of multiple correlation. (See Chapter 13 for use of sub- 
scripts. See R.,. below). 

Coefficient of partial correlation (see Chapter 13 for use of subscripts). 
Coefficient of reliability. 

Spearman-Brown coefficient of reliability when test length is in- 
creased p times. 

Estimate of correlation between two tests when one is lengthened p 
times and the other 4 times. 

(1) Rank order correlation; (2) row mean square in analysis of 
variance. 

Sum of ranks for group 7. 

The number of correct responses to the ith item among the upper 
and lower groups, when these groups have been selected on the basis 
of total test score. 

Coefficient of multiple correlation. (See Chapter 13 for use of sub- 
scripts. Also written Туу, 71.23.) 

(1) Standard deviation of a sample, used often with a subscript to 
name the variable; (2) standard error of a statistic as estimated 
from sample values, used with a subscript to name the statistic, 
AS ву, Sp, etc. (see c). 

The standard error of estimate for a sample (see s?, „). 

Variance, or square of the standard deviation, with meanings anal- 
ogous to those stated above for s. 

Variance of y-residuals from regression line to estimate у from =, 
and similarly for other subscripts. (See Chapter 10 for formula 
which differs slightly from formula in some other texts.) 

Sum of squares used in analysis of variance and covariance. For 
meaning of Sw, S», 8, Si, S», S; and Ss, see Chapter 15. 

(1) Ratio of a variable with unit normal distribution to the sample 
estimate of the standard error of that variable; (2) the number of 
observations in a tied set, as in Chapter 18. 

(t — 1)t(t + 1) where ¢ is the number of observations in a tied set. 
The total of all scores in group $ or ХХ. 

The total sum of deviations of coded scores in group $ from an arbi- 
trary origin, or Т, = УМ. 

(1) Ordinate of unit normal curve; (2) statistic into which the vari- 
ance ratio F is transformed, as in Chapter 17. 

The number of runs in a sample (see Chapter 18). 

Width of a confidence interval. 

Coefficient of concordance. 

Deviation of a score from its sample mean, z = X — X. (Similarly 
for other letters.) 


492 - Glossary of Symbols 


2 
X 


a 
[e 
[os 
Ти» Ney 
ш 


[37] 
Eas бло, ete. 


т 
р 


Pzz Puy 
Pisas 


pi.24 


Deviation of a score from an arbitrary origin, z = X — А. 
A gross score. (Similarly for other letters.) 
Mean of a sample. (Similarly for other letters.) 
Sample estimate computed from a regression equation. 
Median of a sample. 
25th percentile of a sample. 
Test score of an individual in the upper or lower group. 

Analogous to x, 2’, X, 2, X. 
Ordinate of population distribution on a continuous variable. 
A variable with unit normal distribution, or an abscissa of the 
normal probability curve. (In other texts this letter is sometimes 
used for an ordinate of the normal probability curve but is not so 
used here.) 
"Transformation used in testing significance of a correlation coefficient. 
"Transformation used in testing significance of a biserial coefficient 
of correlation. 
(1) Level of significance; risk of Type I error; (2) subseript attached 
to a statistic (as za, la, Ха) to indicate probability of obtaining 
a smaller value of the statistic by chance; (3) subscript attached to 
X, x, Y, or y to indicate that the observation is for the ath individual. 
(1) Probability of accepting a hypothesis when an alternative is 
true, risk of Type II error; (2) population regression coefficient. 
(For subscripts used with 8 see Chapters 10 and 13.) 
Population regression coefficient when variables are expressed as 
multiples of their standard deviations. (Subscripts same as for В.) 
Population value corresponding to the statistic z,. 
Population value corresponding to z*. 
Correlation ratio in the population. 
Population mean. (Subscript may be used to indicate the variable 
or statistic of which it is the mean.) 
Median of a population (see Chapter 16). 
Percentile of a population, the percent of cases with lower scores 
being indicated by the subscript (see Chapter 16). 
3.1416, or the ratio of the cireumference of a circle to its diameter. 
Correlation coefficient in a population. (In some other sources it 
is used to denote a coefficient of rank order correlation but is not so 
used here.) 
Reliability coefficient in a population. 
Coefficient of partial correlation in a population. (For other sub- 
scripts, see Chapter 13.) 
Coefficient of multiple correlation in a population, correlation of 21 
with regression estimate of zı made from best linear combination 
of a, тз and 24. (For other subscripts, see Chapter 13.) 
(1) Standard deviation of a population, often used with a subscript 
to name the variable as c, oy; (2) standard error of a statistic, used 
with a subscript to name the statistic, as OX, 05, Gs, etc.; (3) standard 
error of estimate for a population — standard deviation of residuals 
from a regression equation, used with subscripts as 01.2, in which 


Glossary of Symbols - 493 


the primary symbol before the period names the variable estimated 
from the regression equation and the secondary subscripts following 
the period name the independent variables employed in the equation. 
Variance of a population, with subscripts as for c. 

The sum of. 


Xa + Xanı +--+ + Хь (see pages 111 and 197). 


(1) A measure of relation based on order, discussed in Chapter 11; 
(2) the “true score” of an individual on X, discussed in Chapter 12. 
The “true score” of an individual on Y, discussed in Chapter 12. 
(1) A measure of relationship between two dichotomous variables, 
discussed in Chapter 11; (2) a transformation of percents to angles, 
described in Chapter 16. 

Chi-square. 

The fifth percentile of the x? distribution. (In other sources this 
symbol may mean the 95th percentile.) 

Chi-square computed from ranks. 

Chi-square after application of Yates’ correction. 


X is greater than У, or У is less than X. 

X is greater than or equal to У; X is not less than У; У is less than 
or equal to X; ог Y is not greater than X. 

Is equal to. 

Is not equal to. 

Is approximately equal to. 

The absolute or numerical value of a, the sign being taken as positive. 
The positive square root of a. 

The positive square root of a. 


Answers to Problems 


Exercise 2.1, page 17 

1. b. P(X = 1) = P(X = 2) =... = Р(Х = 6) 

c P(X = lor X = 2) = P(X =1) + P(X = 2) =4 

d. P(X = 1) + P(X = 3) + P(X = 5) =4+44+4=3 

f. P(X #6) =1— P(X =6) =1-$=# 

a. The probability that 3 will appear on both throws. 

b. The probability that the sum of the numbers appearing on the two throws 
will be 2. 

c. The probability that the sum of the numbers appearing on the two throws 
will be 3. 

d. The probability that the second number will be 5 if the first number is 6. 

е. The probability that the second number will be 6 when the first number is 6. 

f. The probability that both numbers will be odd. 

g. The probability that both numbers will be smaller than 6. 

h. The probability that the sum of the two numbers will be 11. 


Exercise 2.2, page 19 
2. а. 120 Ъ. 210 с. 35 d. 10 е. 6 f. 16 g. 35 h. 432 
Exercise 2.3, page 26 
1. — (QT-F.3) = .240 + 412+ .265 + .076 + .008 = 1.001 
2. а. (.5 + 5)? = 125 + .375 + .375 + .125 = 1.000 
(.5 + .5)° = .016 + .094 + .234 + .312 + .234 + .094 + .016 = 1.000 
(.5 + .5) = 001 + .010 + .044 + 117 + .205 + .246 + .205 + .117 
+ .044 + .010 + .001 = 1.000 
4. а. (.4 + .6)° = 064 + .288 + .432 + 216 = 1.000 

b. (.4 + .6)% = .004 + .037 + .138 + 276 + 311 + .187 + .047 

€. (.4 + .6) = 000 + .002 + 011 + .042 + 111 + .201 + .251 + .215 
+ .121 + .040 + .006 
(.8 + .2)8 = 512 + 384 + .096 + .008 
. C8 + .2)§ = 262 + .393 + .246 + .082 + .015 + .002 + .000 
(.8 + .2)! = .107 + .268 + .302 + .201 + .088 + .026 + .006 + .001 
+ .000 + .000 
Exercise 2.5, page 30 


1. a. E(Np) = NP = 10(.5) = 5 b. №; No. c. Yes E(Np) = 20(.5) = 10 
d. оу» = NPQ = 10(.5)? = 2.5; 500 observations are actually a sample; 1000 
observations provide better approximation than 500. 
e. Yes. cp = 20(.5)? = 5.0 
2. c. Different only because of sampling error. 
d. No; 500 observations are subject to sampling error. 


OEC 


ppp 


Answers to Problems · 495 


Exercise 2.6, page 36 


3. a. .12 b. .14 c. .99 d. .05 е. .99 
f: 98 g. .05 h. .22 i .03 j. .85 
k. .10 1. .60 т. .005 п. .002 о. .06 

4. а. — 1.037 b. — 1.751 с. .385 d. .6745 e. 1.645 
f. 1.960 g. 2.576 h. — 2.576 i 1.96 1. .6745 


Exercise 3.1, page 51 
1. Ир = .4, accept all stated hypotheses except P = .9. 
If p = .5, accept all stated hypotheses except P = .16 and P = .94. 
If p = .2, reject all stated hypotheses except P = .04 and P = .49, 
2. If p = .4, range is 12 < P < .75. 
If p = .5, range is .18 < P < .82. 
If p = .2, range is .02 < P < .55. 


Exercise 3.2, page 57 


5. а. 41< P < .82 b. .56 < P < .74 с. 59 <Р <.71 
d РАБ е. 14 <Р < 40 f. .22 <Р < 28 

6. As N increases, width of interval decreases. 

7.205 « P'« М 8. .65 <Р < .83 


Exercise 3.3, page 59 
2. а. S3and хХ 211 b. X < 4and > 14 с. X < Тапа Х > 18 


Exercise 3.4, page 63 


3. a. HJ b. HK c. AB d. AC e. JM 
f. KM g. BD h. CD i BD j CD 
4. а= 02; a = ЛІ 5. P = .80 6. а = 11 
T. « = ordinate at point Н; В = 1 — ordinate at alternative to И. 
8. a = .172. See A of Figure 3-5. 9. а = .172. See C of Figure 3-5. 
Exercise 3.5, page 66 
8. И ESTY. 4. more satisfactory. 
5. a. about .05 b. about .2 с. about .4 
d. about .95 e. about .8 f. about .6 
6.4; C 1. С.А 
Risk of error 
10. Situation Decision a B 
1 R 071 0 
2 А 0 X 
3 R 115 0 
4 A 0 X 
5 A 0 X 


496 - Answers to Problems 


Exercise 3.6, page 76 


1. 36 < P < 48 
2. Yes; р = 48; 2 = — 3.46; 26 = — 2.05; Reject H : P = .60 
3. № = 721 4. N = 63 5. N = 8131 


Review Exercise, раде 79 


3. RII; АГ; RII; AI; АТ 


Exercise 4.1, page 92 
1, a. The probability of obtaining a random sample which will yield a value of 
x? less than 9.9 in a situation in which there are 21 degrees of freedom is .02, 
В, In x? problems involving 25 degrees of freedom, 90% of random samples 
may be expected to yield values of x? between 14.6 and 37.7. 
m. The probability that a random sample will show a value of x? larger than 42 
if the number of degrees of freedom is 22 cannot be read exactly from Table 
VIII but that probability is between .005 and .01; there is probability be- 
tween .005 and .01 of obtaining x? > 42 if the degrees of freedom are 22; 
between .5% and 1% of all random samples may be expected to produce a 
value of x? larger than 42 if the appropriate number of degrees of freedom. 


is 22. 
2. and 3. х. X.» 
b 14.1 A 18.5 A 
с, 14.1 R 18.5 A 
d. 141 R 18.5 R 
e. 25.0 A 30.6 A 
f. 37.7 R 44.3 R 
g 43.8 A 50.9 A 
h. 26.3 R 32.0 A 
i 19.7 R 24.7 A 
dor ale ME 1597 64 


Ехегсіѕе 4.2, раде 94 
1. a. n = 3, М = 10 b. п = 3, М = 50 
' 2. Better fit in (b) because N is larger. 

3. Number of heads: 0 1 2 3 4 
Expected frequency: 4 16 24 16 4 
Observed frequency: 2 12 23 20 7 
x? = 5.29; x29, = 9.5; No 4. x? = 10.58 


5. x’ = 5.297. Multiplying observed and theoretic frequencies by r makes x? 
r times as large. 


Exercise 5.1, page 113 


9 8 8 8 8 
2. а. Nx b. Уа =4 Ya с. У зау =3 Уз 
$=6 isd i=5 i-6 i= 
8 8 
d. (Y. 4) = XY: -44 e. Ух, 
i=5 i=5 ї=1 


Answers to Problems - 497 


Exercise 5.1, page 113 cont. 


3. а. 60 Ъ. 5 с. 0 d. 26 e. 12 
f. 42 g. 625 h. 5 i 125 j +26 
4. а. 2 b. 6 с. 424 d. 129 e. 9 
f. 64 g. 64 h. 26 i 26 


Exercise 5.2, page 117 
2. X = 81.3, з = 5.40 
8a X=11,8=355 b.XY=15,8=299 с. X = 2.57, # = 6.95 


Review Exercise, page 123 
ame b. uou CX,5x,X-u Gd 
с, does not fall into any of the classes named, for its value depends upon sample 
size as well as population P. 


Exercise 7.1, page 144 


1. a. 2.5 b. 1 с. .5 d. .05 
е. .4 САВ 5.2 h. 4 
2. .044; .913; .0001; .9998 
Exercise 7.2, page 146 
4. а. to = — 2.35 and # = 2.35 when n = 3 
b. tos = — 1.75 and £s = 1.75 when n = 16 
с. to = — 1.645 and Ls; = 1.645 when n = 500 
5. Li = — 3.08 and t = 3.08 when n = 1 
tio = — 1.37 and {о = 1.37 when п = 10 


tao = — 1.282 and t.» = 1.282 when л = 400 
6. а. .20 < P < 40 whenn 21 b. .05 < P < .10 when n =3 
с. .001 < P < .01 when n = 500 


Exercise 7.3, page 151 
3. 95%; all 4 or 100% 
90%; all 4 or 100% 
50%; 3 or 75% 
10%; 1 or 25% 
4. C(25.6 < и < 28.8) = .99; C(26.0 < џ < 28.4) = .95 


Exercise 7.4, page 160 
1. t= 232. If a = .05, reject H :ш = ш. If a = .01, accept Н. 
2. D = 148; sp = .754; t = 1.90; tss = 2.02 

If a = .05, accept H : ш = pa 


2 
3. Е ip, t= —= = 
> eae v3.35 
For balance, t = aa = 2, 


Exercise 7.5, page 169 
1. а. 158 b. 475 c. .053 


2, а. .224 b. .5 c. .167 


498 - Answers to Problems 


Exercise 7.5, page 169 cont. 

3. N = 750 

4. а. [= 4.4/3.92 = 1.12; £a = 3.25; accept H 
b. t = 4.4/4.1 = 1.07; accept И 


с. п = 9.54 — 2 = 7.5 or 7; accept H 

1. а. t = —3.2/1.386 = —2.31; tos = —2.20; reject H 
b. Accept H without computation as X < 35 
с. t = — 2.308; tos = — 1.80; reject H 
d. 28.75 < u < 34.85 
e. 27.49 < џ < 36.11 

8. а. (7.10), normal b. (7.10), 6 п=5 
с. (7.21), normal d. (7.21), t,n = 31 
e. (7.24), normal f. (7.24), (n = 11.6 

9. а. (6) Ъ. (1) с. (1) or (2) 

10. (1) N = 292 (2) №, = М, = 420 


(4) № = № = 47 


Exercise 8.1, раде 183 


8. 90%; 3 or 75% 
80%; 3 or 75% 
60%; 2 or 50% 
40%; 1 or 25% 

9. x? = 6800/237.16 = 28.07 > х?.% 
10. 4.53 < o < 7.42 


Exercise 8.2, page 187 


F Fas Рю Probability 
2. а. 1.38 1.74 2.20 P10 
b. 2.88 1.95 2.56 P< .02 
с. 1.76 2.18 3.09 Ред) 
d. 4.78 2.90 4.73 Р < 02 
е. 2.19 1.72 2.19 P= 02 


Exercise 9.1, page 198 
See Table 9.1, page 199 


Exercise 10.1, page 242 
Data required for 1 and 2: 
У = 50.83; (N — 1s? = 2869.8; 8,7 = 70.00; 


Х = 47.10; (№ — 1)s,? = 5078.6; s; = 123.85; 


Sa = 45.75; Sy.2 = 6.764; bys = А58 
(453 — 0)(11.13) V41 
1а. 1027) шш с лге ыы 
1. а. By (10.27) ӨТ 4.77 
Ву (10.28), ¢ = —902V40 _ дь 
V1 = (.602) 


Lo = 2.70 Reject H :6,, = 0 


Sy = 


Sz 
Yis 


8.367 
11.13 
602 


d. (6) 


(3) Ni = 346, №, = 577 
(5) № = 44, №, = 87 


Answers to Problems - 499 


Exercise 10. 1, page 242 cont. 
^ LL (453 — 6)0113)V4l _ 
b. By (10.27), t= v E 
tins = 2.02 Accept H : By: = .6 
_ (50.83 — 52)/42 _ _ 
с. Ву (10.25), t = тит 1.11 
Accept H : м, = 52 
. By (10.32), .261 < By, < .644 
. By (10.31), 48.72 < py < 52.94 
Ву (10.33), 30.9 < e?,. < 75.0 
. Ву (10.34), 45.5 < ц: < 49.8 
3. X = 24.2; Y = 690; bs = 217 
Y = 63.8 + .217Х; s'y = 541 
— 125 < By, < .559 
64.6 < py < 73.4 
33.3 < о?у. < 103.2 
Accept H : Byz = 0 and H :8,, < 0; Reject H :В.. = b 


1.55 


по сь 


Ехегсіѕе 10.2, раде 252 


r= 08 Nab Ме а ИИ 
МУ - 3 A04 394 .538 601 1.20 
М1 
"VN E 20 300 540 603 120 
r= .20 
туш 354 100 183 2.04 4.08 
vi- 
"Ni 400 1.00 1.80 2.01 4.00 
т = .60 
"УМ -2 1.30 3.67 671 7.50 14.98 
vi-t 
N-i 1.20 3.00 5.40 6.03 12.00 


Highly satisfactory agreement for № large and г near zero. 


Exercise 10.3, page 257 
1. ¢ = 133 2. t = 2,03 


Exercise 10.4, page 258 
1. а. XY = 3138; У = 49.15: ~~ bye 652, == 7.25 
Sy = 9.42 Try = -502; Sy.2 = 8.19 
b. Ў = .652Х + 28.69 c. 75% d.t=57 


500 - Answers to Problems 


Exercise 10.4, page 258 cont. 
e. If X = 62, = 69.11; if = 38, Ў = 53.47; if X = 54, Y = 63.90 


f. (1) 53.17 < py.2 < 58.97 (2) 425 < Bye < 879 
(3) .336 < p < .635 (4) 49.7 < c^, < 86.8 
2. a. Y = 20.16 + .812X b. X = 20.60 + .456Y c. 439 < p « 737 


d. r = .6085; 2, = .707 
р = 6000 t = 693 
t = (707 — .693) V119 = 1.53 
Accept H : p = .60 

„ф= — 14.6 


‚ һб» = (.96) (1.42) > 1.00 

. Because Dey > УХУ, r > 1.00 

. т, byz, and bzy must all have same sign. 

. г? should be equal to byzbzy, but .36 = (.8)(.9). 

. 8у.2 cannot be larger than sy. 

. Same difficulty as in с. 

bys = — 9 and bz, = — .6. Same difficulties as in с and d. 
‚ byz = 78/3» = (.5)(10)/4 = 1.25, not .25. 

. by, should be (.5)(3)/6 = .25 instead of 1.00. 

bey should be (.5)(6)/3 = 1.00 instead of .25. 


2A — ZB 


V. 0117 


= 
Ф 


rpm mo ao o р 


=— 68 


Exercise 13.1, page 326 
1. Y = 22.08 + .214Х, + 477Х, 


2. ула = .642 
ч ЛАТА; ГЭ 
3. F= ао ® 33.3 


4. а. У = А, + bys. Xo + бит Xo + в Ха 
b. b* 2.67 Ar тиф js oz dis таф%ут.в = Ty 
тарат + Був. + Tb ут. = Тув 
тпру + теб": + 5*ут,» = Ти 


с. Ry. = Vib ys + туру ат ЕЕ Tyid* s os 


Тул — Ту Тув — Ту? 
5. a. в Е oR) b. b*, Н Жашы] 
l—ru шн ЖА, 1—т% 
Sy _ Ty — Гуа 8, 
с. byo = Вж 50 = Tit "Ге 8у =p, 8 
4.6 e осир вы d. bs. = b*y6.4 T 
e A= Nc bus — их f Rys = Утба + Tub ува 
g. buss + тиб ва = Ty Tub*js + Буса = т 
Tis — Түз Tz — 7137 
ж m UNES * LO A 13737 81 81 
6. а. 0*5; 1— 5 b. bu TNT ad с. bu; = Vui d. ла = бта 
е. А = Х, – ВХ: — by aX; f. Ris = Мтиб*з.л + ти$®ив 
Е. Бат + тиб* из = Ты Tab*i3.7 + b*12.3 = Ти 


——— 


ie 


— 


Answers to Problems - 


Exercise 13.2, page 345 


ds ru Tes — Teslos В. тат Туз — Тутї зт 
Т М - МУ rs OMT = VT = rig 
Tu — Туту 


= 
б ү = 724.5 — 726.5746.5 
VI = Риз 1 — rus 
3. 743.2507 = enr НЕ 4. 143.2067 = bss. 2567ba4.2507 
У1 — rg. V 1 — 7287.246 
5. 1 — Аль = (1 — т) (1 — 72:21) (1 — 732) 
= (1 — 12) (1 — 72.6) (1 — r3) ete. 


501 


Index to Subject Matter 


Acceptance, of false hypothesis, 61; region 
of, 46; values, 47, 50 

Addition of probabilities, 15 

Adjusted means in analysis of covariance, 
397 

Algebraic relations in analysis of covari- 
ance, 388; in analysis of variance, 210 

Alpha, error, 60; as subscript indicating 
individual, 230 

Alternatives to hypothetical mean, 161; 
to hypothetical proportion, 60 

Analysis of covariance, 387 fi.; adjusted 
means, 397; hypothesis of common 
slope, 390; hypothesis of no difference 
between regression estimates, 398, 406; 
hypothesis of single regression line, 393; 
hypothesis of zero slope, 393; line of 
non-significance, 410; matched regres- 
sion estimates, 398, 406; point of non- 
significance, 399; predictor variables, 
387, 404; region of significance, 400, 407; 
regression among means, 396; regres- 
sion in separate groups, 390; regression 
in total group, 390; subdivision of 
cross-products, 387; subdivision of sum 
of squares, 396, 405; symbolism, 387 

Analysis of variance, 196 ff., 348 ff.; alge- 
braic relations, 210; component sum 
of squares, 208, 224; critical region, 205; 
error variance, 220, 358; F test, 204-08; 
Graeco-Latin square, 373; hypotheses, 
201, 349; interaction, 349; Latin square, 
373; main effects, 371; matched groups, 
382; mathematical model, 201, 360; 
orthogonal comparisons, 356; by ranks, 
438; in regression, 244, 324; scores not 
available, 226; several measures on 
each individual, 216, 363; subdivision 
of sum of squares, 211, 355; two bases 
of classification, 349; unequal frequen- 
cies in subclasses, 381; variance ratio, 
204 

= transformation, 423; Table XIX, 
479 

Aresine transformation, 423; Table XIX, 
479 

Attenuation, 300 


Back solution, 328, 329, 335 
Bartlett’s test for homogeneity of variance, 
193 


Beta error, 60; regression weight, 239 

Biased estimate, 119, 240; sample, 171 

Binomial distribution, 21; aids for com- 
puting probabilities in, 57; Tables IV 
and V, 458-60; approximation by nor- 
mal curve, 31; graphic representation, 
20, 24, 25, 26, 38, 41, 47; mean of, 29; 
standard deviation of, 29 

Biserial correlation, 261, 267; comparison 
with point biserial, 271; transformation 
to normal variate, 269 

Bivariate normal distribution; see Normal 
bivariate distribution 


C values in analysis of covariance, 389; 
in solution of normal equations, 333 

Central limit theorem, 143 

Chi-square, 85; comparison with Kolmo- 
gorov-Smirnov test, 443; curves, 86; 
exact probabilities, 86; graphic repre- 
sentation, 87, 88, 89, 134, 135; observed 
and expected frequencies, 93; small ex- 
pectations, 106; table, 91; Table VIII, 
464; in tests of independence, 95; Yates’ 
correction, 105 

Classes, 9; exhaustive, 14; mutually ex- 
clusive, 14 

Clopper-Pearson chart, Chart VI, 461 

Cluster sampling, 175 

Coded scores, 237 

Comparisons, orthogonal, 356 

Components of a score, 293; of sum of 
squares in analysis of variance, 224, 351, 
355, 369; of sum of squares and prod- 
ucts in analysis of covariance, 387; of 
sum of squares in regression, 242, 322 

Computational routines: analysis of vari- 
ance, 353, 366, 377; chi-square test of 
normality, 121; coefficient of concord- 
ance, 283; coefficients of correlation 
and regression by use of a machine, 234; 
coefficients of correlation and regres- 
sion from ascatter diagram, 237; corre- 
lation ratio, 279; Doolittle method in 
multiple regression, 328-35; Kelley- 
Salisbury iterative method in multiple 
regression, 346; Kuder-Richardson 
method for reliability coefficient, 311; 
mean and standard deviation of a sam- 
ple from grouped data, 117 

Conceptual population, 9 


504 · Index to Subject Matter 


Concordance, coefficient of, 284, 439 

Confidence, coefficient, 53; interval, 53; 
limits, 53 

Confidence and probability, 55 

Conic, 407 

Consistency of estimate, 52, 414 

Contingency, coefficient of, 287 

Contingency table, 95; more than two 
classes in each variable, 96; same in- 
dividuals in both variables, 102; test of 
independence in, 95; two classes in each 
variable, 100; two classes in one varia- 
ble, 99; very small samples, 103 

Corner test of association, 447 

Correlation, biserial, 267; coefficient of 
contingency, 287; Flanagan’s estimate 
of, 275; multiple, 321; partial, 340; 
phi, 272; point biserial, 262; ranks by 
several judges, 283; ranks by two 
judges, 278; ratio, 276; tau, 286; tet- 
rachoric, 273 

Correlation coefficient (in bivariate popu- 
lation), 233; computation by machine, 
233; computation from scatter diagram, 
237; confidence interval for, 255; con- 
sistency among coefficients, 344; cor- 
rection for attenuation, 300; inefficient 
estimate of, 420; in normal bivariate 
population, 248; standard error of, 252; 
{ test for, 251; test that two are equal, 
255; z transformation, 254 

Covariance, 248; analysis of, see Analysis 
of covariance 

Critical region, 46; choice of, 62-65; one~ 
sided and two-sided, 64, 162; related to 
power of test, 62 

Cumulative distribution, 26, 427 


Degrees of freedom in general, 90; chi- 
square, 90; P distribution, 185; ran- 
dom sample, 130; "Student's" distri- 
bution, 146 

Density, probability, 25 

Design of experiments, 210, 348, 363, 373, 
375; of samples in surveys, 171 

Dichotomous population, 19; scale, 261 

Difference between two correlation coeffi- 
cients, 255-57; between two means, 
254-0; between two proportions, 77— 

9 

Discrete variable, 8 

Disproportionate subclass frequencies, 381 

Distribution, population, 9; bivariate, 
230, 246; multivariate, 337; normal, 
110; normal bivariate, 248; of several 
classes, 80; of two classes, 19 

Distribution, sampling, 5, 118; binomial, 
21; chi-square, 84, 132; correlation 
coefficient, 249, 251, 255; F, 140; non- 


central f, 265; normal, 132; rank cor- 
relation, 281; *'Student's" or ї, 135 
Distribution-free methods, 426; see also 
Non-parametric methods 
Doolittle method, 326; Fisher modifica- 
tion, 331 
Double subscripts, 196, 290 
Durost-Walker correlation chart, 239 


e, 110 

Efficiency, 144, 414 

Ellipses of bivariate normal distribution, 
250 

Error in measurement, 292; two kinds of, 
60; in analysis of covariance, 393, 395; 
in analysis of variance, 220, 358, 374, 
380, 382; see also Measurement error 

Estimation, 51; in analysis of variance, 
202, 220, 358, 361, 370; consistent, 51; 
efficient, 144; of error variance, 208; 
inefficient, 415; by an interval, 53; of 
mean, 144, 148-51, 416, 417; of point 
biserial correlation, 265; of regression 
statistics, 241, 339; of reliability coeffi- 
cient, 307-14; of p, 255, 270, 273, 275, 
420; by a single value, 51, 415; of 
standard deviation, 416, 417; of true 
variance, 297; unbiased, 51, 119; of 
variance, 119, 180 

Eta, 278 

Expectation, 27; of frequency, 29; of 
mean score in measurement, 290; of 
mean square, 226, 361, 362; of pro- 
portion, 30; of sample mean, 119; of 
sample variance, 119; of a variable, 
109; symbol for, 28 

Expectations, small, 106 

Expected value, see Expectation 

Ея design, 216, 348, 362, 373, 

5 


F ratio, 140; in analysis of variance, 205; 
comparison of variances, 185; critical 
region, 186, 205; degrees of freedom, 
141; graphs, 141, 209, 210; independ- 
ence in, 140; relation to chi-square, 
189; relation to correlation ratio, 278; 
relation to 4, 188; relation to z, 189; 
tables, 185; Table X, 460; transform- 
fon to normal, 424; reading tables of, 

Fmax distribution, 192; Table VII, 462 

Factorial, 18 

Finite population, 70, 167 

Fisher and Yates Tables, 186, 195 

Flanagan's estimate of р, 275, 287; Table 
XIII, 472 

Fourfold correlation, 271; contingency 
table, 100 

Frame, 171 


Graeco-Latin square, 373 
Graphical representation of acceptance 
values, 48; binomial, 20, 24, 25, 38, 41, 
47; chi-square distribution, 87, 89, 134; 
confidence interval for cumulative dis- 
tribution, 442; confidence interval for 
E(d), 446; confidence interval for p, 
150; confidence interval for ps — uy, 445; 
confidence interval for P, Chart VI, 461; 
confidenceintervalfor p, Chart XIV, 476; 
confidence interval for g?, 182; cumu- 
lative binomial, 26, 41; cumulative 
chi-square, 87, 88, 135; cumulative F, 
141; cumulative normal, 41, 136; cu- 
mulative **Student's" or t, 136, 138, 139; 
F distribution, 131, 208, 209; inde- 
pendence of components of sum of 
squares in analysis of variance, 210; 
independence of X and s, 131; normal, 
38, 75, 120; normal bivariate ellipses, 
250; power function, 65, 73, 162; proba- 
bility in bivariate distribution, 247; 
region of rejection and rejection values 
for P, 48, 50; relations in analysis of 
covariance, 392, 394, 396, 398, 399, 404, 
409, 411; sampling distribution of r, 253 


High and low groups of test scores, 417; 
estimation of correlation from, 275, 420; 
Chart ХУ, 477; Table XIII, 472; estima- 
tion of mean and standard deviation 
from, 417 

Homogeneity of variance, 191-95 

Hypothesis, 4; choice of critical region, 
64, 162; critical region, 46, 62; level of 
significance, 46; one-sided and two- 
sided regions, 60; power of a test, 62, 
162; procedures in testing, 5; sample 
size in test of, 72, 163 

Hypothetical population, 10 


Independence, in contingency tables, 95- 
106; of components of sum of squares, 
208; of mean and standard deviation, 
130; of observations, 14; among ranks 
of several judges, 283; between ranks 
of two judges, 278, 286; non-parametric 
test, 446 

Inefficient estimate, 275, 415 

Interaction, 349, 362, 369; test that inter- 
action is zero, 353, 370; used as error, 
360 

Internal consistency of a test, 311 

Interval estimate, see Confidence, interval 

Inverse matrix, 332 

Iterative method in regression, 345 


Johnson-Neyman method, 384, 406 


Index to Subject Matter · 505 


k sample tests, analysis of variance, 196 ff., 
348 ff.; analysis of variance by ranks, 
438; homogeneity of variance in k 
samples, 191; median test, 435; sum of 
ranks test, 436 

Kelley-Salisbury method, 345 

Kolmogorov-Smirnov test, 426, 440; com- 
parison with chi-square test, 443 

Kuder-Richardson formula, 311 


Latin square, 373; interaction in, 373; 
several observations in each cell, 377 

Layout, three-way, 363; two-way, 360 

Least squares, 325 

Level of significance, 46 

Linear correlation, see Correlation coeffi- 
cient 

Linear function, 132 

Linearity of regression, test for, 244; see 
also Multiple regression 

Location, tests of, 430 

Logarithm, Table XXII, 482; transforma- 
tion, 424 


Matched groups, 382 

Matched regression estimates, 398, 406 

Mathematical model, 109; analysis of 
covariance, 387; analysis of variance, 
201, 217, 361, 362, 369; biserial correla- 
tion, 265; bivariate correlation, 246; 
point biserial correlation, 265; regres- 
sion, 239, 337 

Matrix, 196; explanation of Fisher- 
Doolittle method, 332 

Maximum likelihood, 415 

Mean, adjusted, in analysis of covariance, 
397; components of, in analysis of var- 
iance, 218, 349, 362, 369; computation 
in grouped data, 115; confidence inter- 
val for, 148; deviation, 420; difference 
of, see Difference between two means; es- 
timate based on percentiles, 416; Table 
XVII, 478; of a probability distribu- 
tion, 27; of a sample, 114; probability 
distribution of, 132, 144; standard error 
in cluster sampling, 144; standard error 
inrandom sampling, 144; standard error 
in sampling from finite universe, 167; 
standard error in stratified sampling, 
174; tests of hypotheses, 143 ff., 196 ff., 
443 

Mean square, 202; distribution of, 204; 
independence of, 208; population value 
estimated by, 362, 370; ratio of two, 205 

Measurement, 289 ff.; components of a 
score, 293; effect of error on corre- 
lation, 299; effect of error on the 
mean, 295; effect of error on regres- 
sion, 305; effect of error on tests of 


506. Index to Subject Matter 


significance, 306; effect of error on the 
variance, 297; error, 292; reliability, 
297; symbolism, 291; ‘‘true” variance, 
297; variance, 298 

Median, 413; efficiency of, 415; interval 
estimate for, 440; standard error of, 
414; symbolism, 413 

Median test for К samples, 436; for two 
samples, 435 

Multiple correlation, 321; computation, 
326; relation to partial correlation, 344; 
relation to regression, 322; test of hy- 
pothesis, 324 

Multiple regression, 318, 324; analysis of 
variance in, 324; effectiveness of pre- 
diction, 321; elimination of a predictor, 
339; normal equations, 325; partition 
of sum of squares, 322; tests of signifi- 
cance, 337; two predictors, 318 

Multiplication of probabilities, 15 


Non-central ¢ distribution, 265 

Non-parametric methods, 426 ff.; com- 
parison of k samples by analysis of 
variance on ranks, 438; by median test, 
435; by sum of ranks, 436; comparison 
of two samples by extended sign test, 
431; by Kolmogorov-Smirnov test, 427 ; 
by median test, 435; by run test, 428; 
by sign test, 430; by signed rank for 
paired observations, 432; by sum of 
ranks, 434; confidence interval for eu- 
mulative frequency, 441; confidence 
interval for E(d), 445; confidence in- 
terval for median, 440; confidence 
interval for us — py, 443; corner test of 
association, 447 

Non-significance, line of, 410; point of, 
399 

Normal bivariate distribution, 248; el- 
lipses of equal probability, 249; param- 
eters in, 248; probability density of, 
248; regression equations in, 258 

Normal distribution, 32; approximation 
to binomial, 37; to chi-square, 183; to 
distribution of r, 249; curve, 32; de- 
scription, 33; graphs of, 111, 120, 136; 
reading table of, 31; related to chi- 
square, 101; to F, 189; to t, 136, 145; 
"Tables I, IT, III, 454-57 

Normal equations, 325; Doolittle method, 
326; Fisher-Doolittle method, 331; 
Kelley-Salisbury method, 345 

Normality, assumption of, 130, 143, 201, 
239, 265, 267, 273, 275, 280, 416, 423, 
426; test of, 119 

Normalization of distribution, of correla- 
tion, 254, 421; Table XXI, 481; of F, 
424; of ranks, 424 


Number of cases needed in sample, 69-76; 
163-67 


Observations, 8; independent, 14; paired, 

151, 190, 430, 432; random, 10 
One-sided and two-sided tests, 60 
Orthogonal comparisons, 356 


Parameter, 20; known, 239 

Parent population, 426 

Partial correlation, 340; order of correla- 
tion, 342; relation to multiple correla- 
tion, 344 

Percentile, 413; distribution of, 414; es- 
timation of mean by percentiles, 416, 
417; notation, 413; of standard devia- 
tion, 416, 417; standard error of, 414 

Phi coefficient, 272; comparison with tet- 
rachorie, 274; relation to chi-square, 
272 

Point biserial, 262; comparison with bi- 
Serial, 271; confidence interval, 266 

Poisson distribution, 424 

Population, 1, 9, 109; mean and variance 
of, 109; see also Distribution, popula- 
tion 

Power function, 63, 65, 73, 162; of test, 
62, 162; relation to level of significance, 
63; relation to size of sample, 72, 163 

Predictor variable, 315, 318, 324, 387 

Probability, 2, 3; and confidence, 55; 
density, 25; laws of combination, 15; 
symbolism, 13 

Probability distribution, 4, 8; of statistic, 
5; see also Distribution 

Product moment, 248 

Proportion, 9, 43; in binomial distribu- 
tion, 21; confidence interval for, 53, 68, 
72; Chart VI, 461; difference between 
two, 76; distribution in large samples, 
37, 67; estimation by a single value, 51; 
in finite population, 70; hypotheses, 44- 
51; power function, 73; sample size for 
confidence interval, 69; sample size in 
test of hypothesis, 72; transformation 
to angles, 423; two kinds of error, 60 


Quantile, 413 
Quartile, 413 


Random, numbers, 126; observations, 10; 
sample, 10, 171 

Randomized block experiment, 363 

Rank correlation, 278; efficiency of, 283; 
as estimate of p, 282; relation to prod- 
Be Moment, 281; test of significance, 

81 

Ranks, analysis of variance by, 438; cor- 

relation between, 278, 284, 286; signed 


rank test, 432; sum of ranks test, 434, 
436; transformation to normal, 424; 
uniformization, 424 

Reciprocals, Table XXI, 481 

Region, of acceptance, 46; critical, 46; 
of rejection, 46; of significance, 46, 400, 
407; size of, 46 

Regression in bivariate population, among 
means, 396; in analysis of covariance, 
388, 398; analysis of variance in, 244; 
computation by machine, 233; compu- 
tation from scatter diagram, 237, 239; 
confidence intervals, 241; and correla- 
tion, 233; effect of measurement error 
on, 305; equation, 231; estimate, 231; 
linearity, test of, 245; list of formulas, 
259; mathematical model, 239; parti- 
tion of sums of squares, 242; precision 
of estimate, 244; residual error, 232; 
standard error of estimate, 240 

Regression, multiple, see Multiple regres- 
sion 

Rejection, region of, 46; values, 47 

Reliability, coefficient of, 297, 301; effect 
on correlation, 303; estimate from data, 
307; Kuder-Richardson formula, 311; 
relation to length of test, 302; Spearman- 
Brown prophecy formula, 303 

Residual, 236, 240, 242, 322, 393, 394 

Run test, 428 


Sample point, 50 

Sampling, biased, 172; cluster, 173, 175; 
design in surveys, 171; multistage, 173; 
random, 10, 172; simple, 171; stratified, 
173; systematic, 172; unit, 171 

Sampling distribution, see Distribution, 
sampling 

Score, components of, 293 

Sign test, 6, 430 

Signed test for paired observations, 432 

Significance level, 46; region, 46; relation 
to power, 63 

Size of region of significance, 46 

Slope of line of regression, 232 

Spearman-Brown prophecy formula, 303 

Square root, Table XXI, 481; transforma- 
tion, 424 

Squares, Table ХХТ, 481 

Standard deviation, binomial distribution, 
29; computation of, 115; estimation 
from item analysis, 417; estimation by 
percentiles, 416; of probability distri- 
bution, 28; of sample, 114 

Standard error, of correlation coefficient, 
252; of correlation coefficient estimated 
from rank correlation, 282; of a differ- 
ence in two means, 153, 154, 156, 157; 
of a mean, 144, 145, 153, 157, 168, 174, 


Index to Subject Matter - 507 


177; of a median, 414; of a percentile, 
414; of a proportion, 30, 71, 175, 177; 
of a regression coefficient, 241; of the 
z-transformation, 254, 269 

Standard error of estimate, 240, 299 

Statistics, 5 

Statistical inference, 2 

Stratification, 173 

“Student’s” distribution, 135, 145; Table 
IX, 465 

Subscripts, 112, 196, 230 

Sum of ranks test, 434, 436 

Sum of squares, subdivision of, in analysis 
of covariance, 396, 405; in analysis of 
variance, 211, 355; in regression, 242, 
322 

Summation sign, 111 

Supply, see Population 

Surveys, design of samples in, 171 

Symbolism, see Glossary of symbols in 
Appendix 

Systematic selection, 172 


Tables giving basic data, marks of 42 
first-term statistics students and scores 
on a prognostic test, 235; preregistra- 
tion test scores of 447 college students on 
English test, Table XXIV, 486; scores of 
11 archers in a 3 X 3 layout, 364; scores 
of 36 subjects on memorization of piano 
music, classified on 4 criteria, 379; 
marks of 98 first-term statistics students 
on 3 criteria and scores on 3 predictor 
variables, 316; scores of 48 pupils in 
remedial reading classified in a 4 X 4 
layout, 350; scores for comparison of 
2 groups by matched regression esti- 
mates on 1 predictor variable, 388; sta- 
tistics for comparison of 2 groups by 
matched regression estimates on 2 pre- 
dictor variables, 411 

Tables giving formulas or symbols, com- 
ponent sums of squares and related 
degrees of freedom in analysis of covari- 
ance, 396; formulas for estimating mean 
from percentiles, Table XVII, 478; 
formulas for estimating standard devia- 
tion from percentiles, Table XVIII, 478; 
population values estimated in a two- 
way layout in analysis of variance, 362; 
population values estimated in a three- 
way layout in analysis of variance, 370; 
symbols for mean and variance of a sub- 
group, 199; symbols for scores of many 
individuals on many traits, 291 

Tables giving significance values, absolute 
values of smaller rank sum, 433; max- 
imum difference between two sample 
cumulative distributions, 427, 428; 


508 - Index to Subject Matter 


NDa for use in Kolmogorov-Smirnov 
test, 442; product moment r, Table XI, 
470; rank order r, Table XVI, 478 

Tables of probability, binomial, Tables 
IV, V, 458-60; chi-square, Table VIII, 
464; F ratio, Table X, 466; Fmax, Table 
VII, 462; normal, Tables I, II, III, 
454-56; sampling distribution of ob- 
served proportion p for N — 10 and 
various values of P, 49; "Student's" 
or t, Table IX, 465 

Tables to facilitate computation, loga- 
rithms, Table XXII, 482; r from pro- 
portions of success and failure, Table 
XIII, 472; r from tails of distribution, 
Chart XV, 477; random numbers, 
Table XXIII, 484; squares, square 
roots, and reciprocals, Table X XI, 481; 
transformation of proportions to radi- 
ans, Table XIX, 479; transformation 
of r to 2, Table XII, 471; transformation 
of ranks to standard scores, Table XX, 
480 

"Tau, 286 

t distribution, 135, 145; Table IX, 465 

Test, of hypothesis, 6; see also Hypothe- 
sis; mental test, 289; see also Measure- 
ment 

Tetrachoric correlation, 273; comparison 
with phi, 274 

Transformation, of biserial correlation, 
269; of correlation coefficient, 254; of 
F, 425; inverse sign, 424; logarithmic, 
424; of proportions, 423; of ranks, 424; 
square root, 424; uniformization, 425; 
2, 254 


True component of score, 293; variance of, 
297 

Two sample tests, chi-square test, 99-104; 
distribution free, 427; F test for com- 
parison of variances, 185; Kolmogorov- 
Smirnov test, 426; median test, 435; 
run test, 428; sign test, 430; signed test 
for paired observations, 432; sum of 
ranks test, 434; 4 test, 151-60, 166; 
test that p; = ps, 255 

Two-tailed test, see One-sided and two- 
sided tests 


Unbiased estimate, 119 

Unequal subclass frequencies, 381 
Uniformization, 425 

Universe, see Population 


Variable, continuous, 8; discrete, 8 

Variance, analysis of, comparison of sev- 
eral variances, 191; comparison of two 
variances from independent samples, 
185; comparison of two variances based 
on related scores, 190; confidence in- 
terval, 180; of distribution, 28; of 
proportion, 29; relation to chi-square, 
178; of a sample, 119; unbiased esti- 
mate, 119; see also Analysis of variance 


Yates’ correction, 105 


ztransformation of correlation, 254; Table 
XII, 471 

z* transformation of biserial correlation, 
269 


Index of Authors 


Air Training Command, Table XX 
Anderson, R. L., 386, 412 


Bancroft, Т. A., 386, 412 
Bartlett, M. S., 193, 194, 195, 425 
Biddle, W. W., 384, 386 

Bills, В. Е, 402, 404, 412 
Birnbaum, Z. W., 442, 449 

Bliss, C. I., 108, 386 

Bond, A. D., 409, 410, 411, 412 
Brown, W., 303, 310, 314 

Burt, C., 348, 350, 386 


Chesire, L., 274 

Clopper, C. J., 55, 80, Chart VI 

Cochran, W. G., 106, 108, 175, 177, 194, 
195 

Cowden, D. J., 260 

Cronbach, L. J., 308, 314 

Croxton, F. E., 260 

David, F. N., 80, 175, 177, 255, 260, 
Chart XIV 

David, H. A., Table VII 

Deming, W. Е, 171, 175, 177 

De Moivre, A., 32 

Dixon, W. J., 35, 36, 177, 195, 412, 422, 
449 

Doolittle, M. H., 327, 329, 331, 333, 335, 
343, 346 

DuBois, P. H., 287 

Dunlap, J. W., 287 

Durost, W. N., 239, 256, 260, 347 

Dwyer, P. S., 260, 347 


Edwards, A. L., 386, 425 
Eisenhart, C., 80, 108, 195, 425, Table XIX 
Ezekiel, M., 260, 347 


Ferguson, G. A., 312, 314 

Fisher, R. A., 105, 108, 126, 185, 186, 189, 
195, 250, 254, 260, 331, 333, 335, 386, 
"Tables IX and XI 

Flanagan, J. C., 275, 287, Table XIII 

Florence, S., 341, 347 

Frankel, L. R., 425 

Friedman, M., 438, 449 


Gossett, W. S., 135 (See “Student” in 
subject index) 
Gulliksen, H., 314, 420, 422 


Hansen, M. H., 175, 177, 178 
Harrington, W., 212, 213, 214, 229 
Harris, R., 192, 194, 195 

Hartley, H. O., 183, 192, 195 

Hastay, M. W., 80, 108, 195, 425 
Hauser, P. M., 177 

Hedlund, P. A., 77, 80 

Hotelling, H., 257, 260, 281, 282, 287, 425 
Hoyt, C., 312, 314 

Hurwitz, W. N., 177, 178 


Interstate Commerce Commission, 126 


Jackson, D., 287 

Jackson, R. W. B., 312, 314 
Jessen, R. J., 178 

Johnson, N. L., 266, 287 
Johnson, P. O., 384, 386, 410, 412 


Kelley, T. L., 275, 287, 345, 347, 421, 422, 
Table II 

Kendall, M. G., 126, 282, 284, 286, 287, 
447 

King, A. J., 175, 178 

Kolmogorov, 441, 443 

Kruskal, W. H., 436, 449 

Kuder, G. F., 311, 314 


Lev, J., 206, 288 
Lewis, R. B., 348, 350, 386 
Lindquist, E. F., 308, 314, 412 
Long, J. A., 160, 178 

Lorge, I., Table XXI 


Mann, H. B., 435, 449 

Marks, E. S., 175, 178 

Massey, F. J., 35, 36, 177, 195, 412, 422, 
426, 441, 449 

MeNemar, Q., 178, 347 

Merrington, M., 186, 195, 425 

Miles, C. C., 160, 178 

Mills, F. C., 260 

Mood, A. M., 195, 372, 386, 412, 449 

Moore, G. H., 449 

Moses, L. E., 426, 449 

Mosteller, F., 275, 288, 420, 422, Tables 
XVII and XVIII 


National Bureau of Standards, 58, 80 
Neyman, J., 80, 108, 178, 384, 386, 410, 
412 


Olmstead, P. S., 447, 449 


510 - Index of Authors 


Pabst, M. R., 287 

Paulson, E., 425 

Pearson, E. S., 55, 80, 108, 183, 425, 
Chart VI 


Tate, R. F., 269, 271, 288 

Terman, L. M., 160, 178 

Thompson, C. M., 186, 195, 425, Table 
VIII 


Pearson, Karl, 85, 108, 267, 269, 274, 288 Thorndike, R. L., 308, 314 


Pease, K., 260, 288 
Pitman, E. J. G., 288, 449 


Quenouille, M. H., 412 


Rao, C. R., 386, 412, 425 
Richardson, M. W., 311, 314 
Rope, F. T., 95, 99, 100, 108 
Rubin-Rabson, G., 374, 386 


Saffir, M., 274 

Salisbury, F. S., 345, 347 

Scheffé, H., 449 

Schroeder, Е. M., 200, 216, 229, 364, 386 

Smirnov, N., 428, 440, 443, 450 

Smith, B. B., 126, 449 

Snedecor, G. W., 126, Table X 

Spaney, E., 263, 271, 272, 288 

Spearman, C., 280, 288, 303, 310, 314, 447 

Stephan, F. F., 175, 178 

Stouffer, 8. A., 314 

"Student," 135, 145 (See also “Student” 
in subject index) 

Symonds, P. M., 103, 104, 108 


"d 
& Library 


Thurstone, L. L., 274 
Tippett, L. H. C., 126 
Tolley, H. R., 347 
Tukey, J., 445, 447, 449 


Wald, A., 450 

Walker, H. M., 80, 108, 110, 195, 239, 
260, 347 

Wallis, W. A., 80, 108, 195, 288, 425, 436, 
449, 450 

Waugh, A. E., 260 

Welch, В. L., 158, 178, 266, 287 

Wert, J. E., Table III 

Westover, Е. L., 154, 160, 178, 190, 195 

White, C., 428, 450 

Whitney, D. R., 435, 449 

Wilcoxon, F., 433, 435, 450 

Wilks, 5. S., 80 

Wolfowitz, J., 450 

Wrightstone, J. W., 152, 178 


Yates, F., 105, 106, 108, 126, 171, 178, 
186, 189, 195, Table XI 


—— 


т. W 


es, " 


E 


— 


Calcutta 5 
Coes vf 


% 
2 
E 
> 
- 
| 
+; 


2 
— 


